531 resources related to Speech Perception
- Topics related to Speech Perception
- IEEE Organizations related to Speech Perception
- Conferences related to Speech Perception
- Periodicals related to Speech Perception
- Most published Xplore authors for Speech Perception
The conference program will consist of plenary lectures, symposia, workshops and invitedsessions of the latest significant findings and developments in all the major fields of biomedical engineering.Submitted full papers will be peer reviewed. Accepted high quality papers will be presented in oral and poster sessions,will appear in the Conference Proceedings and will be indexed in PubMed/MEDLINE.
The ICASSP meeting is the world's largest and most comprehensive technical conference focused on signal processing and its applications. The conference will feature world-class speakers, tutorials, exhibits, and over 50 lecture and poster sessions.
HRI is a highly selective annual conference that showcases the very best research and thinking in human-robot interaction. HRI is inherently interdisciplinary and multidisciplinary, reflecting work from researchers in robotics, psychology, cognitive science, HCI, human factors, artificial intelligence, organizational behavior, anthropology, and many other fields.
Circuits and Systems, Computers, Information Technology, Communication Systems, Control and Instrumentation, Electrical Power Systems, Power Electronics, Signal Processing
2018 Chinese Control And Decision Conference (CCDC)
Chinese Control and Decision Conference is an annual international conference to create a forum for scientists, engineers and practitioners throughout the world to present the latest advancement in Control, Decision, Automation, Robotics and Emerging Technologies.
Speech analysis, synthesis, coding speech recognition, speaker recognition, language modeling, speech production and perception, speech enhancement. In audio, transducers, room acoustics, active sound control, human audition, analysis/synthesis/coding of music, and consumer audio. (8) (IEEE Guide for Authors) The scope for the proposed transactions includes SPEECH PROCESSING - Transmission and storage of Speech signals; speech coding; speech enhancement and noise reduction; ...
Broad coverage of concepts and methods of the physical and engineering sciences applied in biology and medicine, ranging from formalized mathematical theory through experimental science and technological development to practical clinical applications.
Video A/D and D/A, display technology, image analysis and processing, video signal characterization and representation, video compression techniques and signal processing, multidimensional filters and transforms, analog video signal processing, neural networks for video applications, nonlinear video signal processing, video storage and retrieval, computer vision, packet video, high-speed real-time circuits, VLSI architecture and implementation for video technology, multiprocessor systems--hardware and software-- ...
Design and analysis of algorithms, computer systems, and digital networks; methods for specifying, measuring, and modeling the performance of computers and computer systems; design of computer components, such as arithmetic units, data storage devices, and interface devices; design of reliable and testable digital devices and systems; computer networks and distributed computer systems; new computer organizations and architectures; applications of VLSI ...
The design and manufacture of consumer electronics products, components, and related activities, particularly those used for entertainment, leisure, and educational purposes
2018 5th NAFOSTED Conference on Information and Computer Science (NICS), 2018
Although speech perception has been studied for more than sixty years and a great deal about how the system works has been researched, there is still more to be discovered. Previous research considered speech perception in only one aspect of sound such as the speech recognition problem. Speech perception focuses on the process that operates to decode speech sounds no ...
2018 4th International Conference on Electrical Engineering and Information & Communication Technology (iCEEiCT), 2018
Speech perception refers to the understandability and ability of speech by a listener produced by a speaker. In order to interpret the speech information in the signal, human auditory system uses both envelope (ENV) and temporal fine structure (TFS) cues. While ENV is sufficient for understanding speech in quiet, TFS cues are necessary for speech segregation in noisy conditions. Speech ...
Proceedings of the 2014 Biomedical Sciences and Engineering Conference, 2014
The EEG mu (μ) rhythm is considered a measure of sensorimotor integration. This rhythm is commonly identified by co-occuring peaks at ~10'Hz(alpha) and ~20 Hz (beta) across the sensorimotor cortex. Suppression of the power within peaks are thought to reflect somatosensory and motor aspects of processing respectively. Suppression of μ power (especially in the beta peak) has been found when ...
2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2012
Previous studies have demonstrated that the left hemisphere was specialized in language function using functional magnetic resonance imaging (fMRI). On the other hand, some studies have revealed that the right hemisphere was related with language function. The hypotheses of this study were that (1) the regions related with language function have a bilateral functional network and (2) the level of ...
2012 IEEE-EMBS Conference on Biomedical Engineering and Sciences, 2012
Cochlear Implants (CI) are surgically implanted biomedical devices that can provide hearing to severely deaf people by direct stimulation of the auditory nerve, electrically. Current CI speech processors do not convey phase information, which is important for perceiving music and for hearing in noisy environments. This paper discusses a new acoustic simulation model for CI, that extracts the `envelope' by ...
Perception-Action-Learning and Associative Skill Memories
ICASSP 2012 Plenary-Dr. Chin-Hui Lee
Robotics History: Narratives and Networks Oral Histories: Jun Ho Oh
APEC 2012 - Dr. Fred Lee Plenary
ICASSP 2012 - Opening Ceremony
APEC 2011-GaN Based Power Devices in Power Electronics
ICRA Keynote: Dr. Matt Mason
ECCE Plenary: Paul Hamilton, part 2
ICASSP 2011 Trends in Multimedia Signal Processing
ECCE Plenary: Pedro Ray, part 2
ICASSP 2011 Trends in Design and Implementation of Signal Processing Systems
ICRA Plenary: Raffaello D'Andrea
Keynote: Poppy Crum - TTM 2018
Robotics History: Narratives and Networks Oral Histories: Jean-Paul Laumond
Cultivating the Art of Persistence - Monique Morrow - IEEE WIE Webinar Series
ICASSP 2011 Trends in Machine Learning for Signal Processing
Robotics History: Narratives and Networks Oral Histories:Bob McGhee
IROS TV 2019- Cutting Edge Forum:Autonomous Driving,Contributions from Intelligent Robotics,AI & ITS
ICASSP 2012 Plenary-Dr. Karlheinz Brandenburg
Although speech perception has been studied for more than sixty years and a great deal about how the system works has been researched, there is still more to be discovered. Previous research considered speech perception in only one aspect of sound such as the speech recognition problem. Speech perception focuses on the process that operates to decode speech sounds no matter what words those sounds might comprise. In this paper, we introduce a new approach for the speech perception as learning the map between speech and other information that we receive from the surrounding environment vice other sensation. So the speech perception problem would be turn into the relationship learning. In this model, we propose using a convolution neural network to map speech signal to image.
Speech perception refers to the understandability and ability of speech by a listener produced by a speaker. In order to interpret the speech information in the signal, human auditory system uses both envelope (ENV) and temporal fine structure (TFS) cues. While ENV is sufficient for understanding speech in quiet, TFS cues are necessary for speech segregation in noisy conditions. Speech cues can be classified into two categories; envelope (ENV) and temporal fine structure (TFS). In general, slow varying ENV (known as recovered ENV) can be recovered from the rapidly changing TFS; however, the degree of ENV recovery and its significance on speech perception are not clearly understood. In order to quantify the relative contribution of the recovered ENV for speech perception, this study proposes a new speech perception metric. The proposed metric employs a phenomenological model of the auditory periphery developed to simulate the responses of the auditory nerve fibers to both original and recovered ENV cues. The performance of the proposed metric was evaluated under different types of noise (both steady-state and fluctuating noise) at different sound presentation level. Finally, to validate the proposed metric, the predicted scores were compared with subjective evaluation scores from behavioral studies. The proposed metric indicates a statistically significant correlation for all cases and accounts for a wider dynamic range compared to the existing metrics.
The EEG mu (μ) rhythm is considered a measure of sensorimotor integration. This rhythm is commonly identified by co-occuring peaks at ~10'Hz(alpha) and ~20 Hz (beta) across the sensorimotor cortex. Suppression of the power within peaks are thought to reflect somatosensory and motor aspects of processing respectively. Suppression of μ power (especially in the beta peak) has been found when performing, imagining or perceiving relevant action (e.g., while watching hand movements and oro-facial movements). μ suppression has also been found to visual speech perception, listening to speech in noise, and when mentally segmenting speech for auditory discrimination, suggesting that it is a sensitive measure of audio-motor integration in speech. The two main goals in this study are to bolster understanding of the timing and function of dorsal stream activity in speech perception by examining ERS/ERD patterns in quiet and noisy discrimination conditions and to provide initial evidence that, via the application of ICA / ERSP, the use of EEG can be extended effectively into speech production. 17 of 20 participants provided left and right p components that were common to perception and production tasks. The most probably source of these components was the premotor cortex (BA 6) with primary motor cortex (BA 4) and primary somatosensory (BA 2/3) cortex providing additional possible sources. Fewer (8 and 7 of 20) participants provided components with average equivalent dipoles emanating from BA 22 and BA 7, respectively, with alpha activity suggesting entrainment within the dorsal stream.
Previous studies have demonstrated that the left hemisphere was specialized in language function using functional magnetic resonance imaging (fMRI). On the other hand, some studies have revealed that the right hemisphere was related with language function. The hypotheses of this study were that (1) the regions related with language function have a bilateral functional network and (2) the level of functional connectivity is dependent on the hearing conditions. To test these hypotheses, the participants were instructed to select the a numeric word (e.g., thirty or thirteen) after they heard an auditory sentence which was presented to both ear (BH), only to the left ear (MH<sub>L</sub>), or only to the right ear (MH<sub>R</sub>). To identify the brain regions related with speech perception, the general linear model (GLM) to estimate the neuronal activation and functional connectivity (FC) analysis were conducted. The results of this study showed that the right superior temporal gyrus (STG) was involved in language function. The right STG had showed the significant level of FC with the left STG. Furthermore, the level of bilateral FC was significantly diminished and the accuracy of correct word selection was significantly decreased in MH<sub>R</sub> condition compared to BH condition. These results may indicate that bilateral network within STG is related with speech perception.
Cochlear Implants (CI) are surgically implanted biomedical devices that can provide hearing to severely deaf people by direct stimulation of the auditory nerve, electrically. Current CI speech processors do not convey phase information, which is important for perceiving music and for hearing in noisy environments. This paper discusses a new acoustic simulation model for CI, that extracts the `envelope' by continuous-interleaved sampling (CIS) principle and `temporal fine structure' cue (phase information) by Hilbert algorithm. The dominant channels are identified from the filter bank outputs and subjected to temporal analysis. The amplitude estimates are derived through rectification and low-pass filtering. By Hilbert transform, the temporal fine structure is extracted through formant peaks in the signal which contributes for speech intelligibility. After band pass filtering the speech signal, carrier signals that are specific to the band output, are constructed by placing square pulses in the extracted fine structure which has greater energy level. The envelopes are subjected to amplitude-modulation using the carrier signal and the synthesized speech is generated. 15 phonemes are analyzed for its envelope and phase information and the simulation results indicated that this model produced improvement in the vowels and consonants identification significantly. The proposed model should be helpful for developing advanced speech processing strategies and improving the speech perception of CI users.
We use the matching pursuit (MP) algorithm to detect induced gamma activity in human EEG during speech perception. We show that the MP algorithm is particularly useful for detecting small power changes at high gamma frequencies (>70 Hz). We also compare the performance of the MP using a stochastic versus a dyadic dictionary and show that despite the frequency bias the time-frequency power plot (averaged over 100 trials) generated by the dyadic MP is almost identical (>98.5%) to the one generated by the stochastic MP. However, the dyadic MP is computationally much faster than the stochastic MP.
Even though previous studies have reported that functional network connectivity (FNC: functional connectivity between functional networks) was related with cognitive process, there was no study to investigate FNC during the speech hearing. In this study, therefore, the FNC was investigated to identify which functional networks interact during speech hearing. To this end, the English sentences as auditory stimuli were presented to each participant. And, speech hearing (SH) accuracy was estimated during the experimental task. As results, the FNC between motor network (e.g., left precentral gyrus) and auditory network (e.g., bilateral superior temporal gyrus) showed significant correlation with the SH accuracy in the task-based scan. Furthermore, the FNC in task-based scan was more tightly linked with the SH accuracy compared to resting-state scan. These results possibly indicate that the identified FNC was associated with speech hearing performance.
The overall goal of speech perception research is to explain how spoken language is recognized and understood. In the current research framework it is usually assumed that the key to achieving this overall goal is to solve the lack of invariance problem. But nearly half a century of sustained effort in it variety of theoretical perspectives has failed to solve this problem. It is argued that this lack of progress in explaining speech perception is not, in the first instance, due to the failure of individual theories to solve the lack of invariance problem, but rather to the common background assumption that doing so is in fact the key to explaining speech perception.
In order to examine factors contributing to speech perception performance variations in cochlear implant users, single photon emission computed tomography (SPECT) was used to examine cortical activity (regional cerebral blood flow, rCBF) elicited by the electrical stimulation of multichannel cochlear implants. Normal hearing (N = 9) and cochlear implant (N = 8) subjects watched a 15 minute video-taped story under two conditions: audio presented monaurally to the left ear (implanted for cochlear implant subjects), and a visual only presentation. Left monaural stimulation in normal hearing subjects produced significant bilateral activation of Brodmann areas 41, 42, 22, 21, and 38. Cochlear implant subjects with relatively high levels of open-set speech perception demonstrated bilateral activation of cortex; however, the extent of activation was significantly less than that observed for normal hearing individuals, particularly in auditory association cortex (Brodmann areas 22, 21, 38). Individuals with minimal open set speech perception scores demonstrated unilateral activation of the cortex on the hemisphere contralateral to the ear of implantation, with minimal auditory association cortex activation.
Background noise is a significant factor influencing the performance of speech perception. Previous studies showed that the temporal fine structure (TFS) plays an important role in speech perception for normal hearing (NH) and hearing loss individuals. The frequency amplitude modulation encoding (FAME) is a successful approach to enhance the TFS information for cochlear implant (CI) recipients. Following the success of FAME for CI recipients, this study aims to evaluate the speech perception performance of FAME for NH listeners in noisy conditions. Experimental results from the present study confirmed that FAME provides better speech perception performance and lower listening effort for NH listeners than noisy speech. In particular, FAME improved Mandarin disyllabic words recognition by as much as 16.7 percentage points and the ease of listening by 1.6 (MOS scale). This demonstrates that the FAME strategy is promising for improving speech recognition performance for NH listeners in noisy environment.
No standards are currently tagged "Speech Perception"