Speech Production And Perception
70 resources related to Speech Production And Perception
- Topics related to Speech Production And Perception
- IEEE Organizations related to Speech Production And Perception
- Conferences related to Speech Production And Perception
- Periodicals related to Speech Production And Perception
- Most published Xplore authors for Speech Production And Perception
The conference program will consist of plenary lectures, symposia, workshops and invitedsessions of the latest significant findings and developments in all the major fields of biomedical engineering.Submitted full papers will be peer reviewed. Accepted high quality papers will be presented in oral and poster sessions,will appear in the Conference Proceedings and will be indexed in PubMed/MEDLINE.
2020 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
CVPR is the premier annual computer vision event comprising the main conference and several co-located workshops and short courses. With its high quality and low cost, it provides an exceptional value for students, academics and industry researchers.
The ICASSP meeting is the world's largest and most comprehensive technical conference focused on signal processing and its applications. The conference will feature world-class speakers, tutorials, exhibits, and over 50 lecture and poster sessions.
HRI is a highly selective annual conference that showcases the very best research and thinking in human-robot interaction. HRI is inherently interdisciplinary and multidisciplinary, reflecting work from researchers in robotics, psychology, cognitive science, HCI, human factors, artificial intelligence, organizational behavior, anthropology, and many other fields.
robotics, intelligent systems, automation, mechatronics, micro/nano technologies, AI,
Speech analysis, synthesis, coding speech recognition, speaker recognition, language modeling, speech production and perception, speech enhancement. In audio, transducers, room acoustics, active sound control, human audition, analysis/synthesis/coding of music, and consumer audio. (8) (IEEE Guide for Authors) The scope for the proposed transactions includes SPEECH PROCESSING - Transmission and storage of Speech signals; speech coding; speech enhancement and noise reduction; ...
Broad coverage of concepts and methods of the physical and engineering sciences applied in biology and medicine, ranging from formalized mathematical theory through experimental science and technological development to practical clinical applications.
IEEE Computer Graphics and Applications (CG&A) bridges the theory and practice of computer graphics. From specific algorithms to full system implementations, CG&A offers a strong combination of peer-reviewed feature articles and refereed departments, including news and product announcements. Special Applications sidebars relate research stories to commercial development. Cover stories focus on creative applications of the technology by an artist or ...
Both general and technical articles on current technologies and methods used in biomedical and clinical engineering; societal implications of medical technologies; current news items; book reviews; patent descriptions; and correspondence. Special interest departments, students, law, clinical engineering, ethics, new products, society news, historical features and government.
Rehabilitation aspects of biomedical engineering, including functional electrical stimulation, acoustic dynamics, human performance measurement and analysis, nerve stimulation, electromyography, motor control and stimulation, and hardware and software applications for rehabilitation engineering and assistive devices.
ISSPA '99. Proceedings of the Fifth International Symposium on Signal Processing and its Applications (IEEE Cat. No.99EX359), 1999
Summary form only given, as follows. Quantitative models of human speech production and perception mechanisms provide important insights into our cognitive abilities and can lead to high-quality speech synthesis, robust automatic speech recognition and coding schemes, and better speech and hearing prostheses. Some of our research activities in these two areas are described. Our speech production work involved collecting, and ...
Proceedings of ISSE'95 - International Symposium on Signals, Systems and Electronics, 1995
The last few decades have witnessed tremendous progress in the performance, reliability, and wide-spread use of speech-processing devices. Using mathematical models of human speech production and perception has been an important factor in the improved performance of these devices. We review recent advances in speech production and perception modeling and summarize the challenges that lie ahead in developing fully parametric ...
Attention and Performance XIV: Synergies in Experimental Psychology, Artificial Intelligence, and Cognitive Neuroscience, None
This chapter contains sections titled: Experiment 1, Experiment 2, Experiment 3, Experiment 4, Summary And Conclusions, Notes, References
2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2015
Mirror neuron system has been investigated using the functional magnetic resonance imaging (fMRI) technique. Activation of the Broca's area and the premotor cortex (PMC), which related with speech production, were observed during speech perception, and seems to be a mirror. However, it is not clear how the mirror neurons function between speech production and perception. This study attempts to investigate ...
10th International Conference on Information Science, Signal Processing and their Applications (ISSPA 2010), 2010
This work proposes a new approach to estimating the speech spectral envelope that is adapted for applications requiring time-varying spectral modifications, such as Voice Conversion. In particular, we represent the spectral envelope as a sum of peaks that evolve smoothly in time, within a phoneme. Our representation provides a flexible model for the spectral envelope that pertains relevantly to human ...
IMS 2011 Microapps - Improved Microwave Device Characterization and Qualification Using Affordable Microwave Microprobing Techniques for High-Yield Production of Microwave Components
Robotics History: Narratives and Networks Oral Histories: Jun Ho Oh
APEC 2012 - Dr. Fred Lee Plenary
Bayesian Perception & Decision from Theory to Real World Applications
MicroApps: Streamlining Radio Communication Link Design from Specifcation to Production (AWR)
ICASSP 2012 - Opening Ceremony
ICRA Keynote: Dr. Matt Mason
APEC 2011-GaN Based Power Devices in Power Electronics
ICASSP 2011 Trends in Multimedia Signal Processing
ICASSP 2011 Trends in Design and Implementation of Signal Processing Systems
ECCE Plenary: Pedro Ray, part 2
ECCE Plenary: Paul Hamilton, part 2
Robotics History: Narratives and Networks Oral Histories: Jean-Paul Laumond
Keynote: Poppy Crum - TTM 2018
John G. Webster - IEEE James H. Mulligan, Jr. Education Medal, 2019 IEEE Honors Ceremony
ICRA Plenary: Raffaello D'Andrea
IRDS: Yield Enhancement - Slava Libman at INC 2019
APEC 2012 - Thomas S. Buzak Plenary
IROS TV 2019- Istituto Italiano di Tecnologia (IIT)- Human Centered Science and Technologies
Summary form only given, as follows. Quantitative models of human speech production and perception mechanisms provide important insights into our cognitive abilities and can lead to high-quality speech synthesis, robust automatic speech recognition and coding schemes, and better speech and hearing prostheses. Some of our research activities in these two areas are described. Our speech production work involved collecting, and analyzing magnetic resonance images (MRI), acoustic recordings, and electropalatography (EPG) data from talkers of American English during speech production. The articulatory database is the largest of its kind in the world and contains the first images of liquids (such as /I/ and /r/) and fricatives (such as /s/ and /sh) for both male and female talkers. MR images are useful for characterizing the 3D geometry of the vocal tract (VT) and for measuring lengths, area functions, and volumes. EPG is used to study inter- and intra-speaker variabilities in the articulatory dynamics, while acoustic recordings are necessary for modeling. Inter- and intra-speaker characteristics of the VT and tongue shapes will be illustrated for various speech sounds, as well as results of acoustic modeling based on the MRI and acoustic data. The implications of our findings on vocal-tract normalization schemes and speech synthesis are also discussed. In the speech perception area, aspects of auditory signal processing and speech perception are parameterized and implemented in a speech recognition system. Our models parameterize the sensitivity to spectral dynamics and local peak frequency positions in the speech signal. These cues remain robust when listening to speech in noise. Recognition evaluations using the dynamic model with a stochastic hidden Markov model (HMM) recognition system showed increased robustness to noise over other state-of-the-art representations. The applications of auditory modeling to speech coding are discussed. We developed an embedded and perceptually-based speech and audio coder. Perceptual metrics are used to ensure that encoding is optimized to the human listener and is based on calculating the signal-to-mask ratio in short-time frames of the input signal. An adaptive bit allocation scheme is employed and the subband energies are then quantized. The coder is variable-rate, noise-robust and suitable for wireless communications.
The last few decades have witnessed tremendous progress in the performance, reliability, and wide-spread use of speech-processing devices. Using mathematical models of human speech production and perception has been an important factor in the improved performance of these devices. We review recent advances in speech production and perception modeling and summarize the challenges that lie ahead in developing fully parametric models.
This chapter contains sections titled: Experiment 1, Experiment 2, Experiment 3, Experiment 4, Summary And Conclusions, Notes, References
Mirror neuron system has been investigated using the functional magnetic resonance imaging (fMRI) technique. Activation of the Broca's area and the premotor cortex (PMC), which related with speech production, were observed during speech perception, and seems to be a mirror. However, it is not clear how the mirror neurons function between speech production and perception. This study attempts to investigate the functions of the mirror neurons by utilizing the high temporal resolution of the Electroencephalography (EEG) system. The participants watched Chinese material from screen then heard the material reading from an earphone, finally made a judgement about the consistency of the two stimuli. The high-density EEG signal under source reconstruction revealed that the Wernicke's area activated before the Broca's area and PMC during the speech perception tasks. Results are also consistent with the mirror neuron system: the speech production related regions are working during the speech perception tasks.
This work proposes a new approach to estimating the speech spectral envelope that is adapted for applications requiring time-varying spectral modifications, such as Voice Conversion. In particular, we represent the spectral envelope as a sum of peaks that evolve smoothly in time, within a phoneme. Our representation provides a flexible model for the spectral envelope that pertains relevantly to human speech production and perception. We highlight important properties of the proposed spectral envelope estimation, as applied to natural speech, and compare results with those from a more traditional frame-by-frame cepstrum-based analysis. Subjective evaluations and comparisons of synthesized speech quality, as well as implications of this work in future research are also discussed.
Considerable effort has been and is currently being concentrated on improving the speech quality at low and very low bit rates. Recently new models of LPC excitation have been devised, which are able to yield good quality speech by exploiting our knowledge of the human speech production and perception processes. Unfortunately, these models generally require too much computational load to be easily implemented on currently available hardware. This paper describes an efficient speech coder, capable of providing acceptable quality speech, within the limitations of both low bit rate (approximately 2.4 kbit/s) and real-time implementation. The coder is based upon pattern classification and cluster analysis with perceptually-meaningful error minimization criteria. Our main objective is improving the excitation representation in a linear predictive coding scheme and, hence, the subjective quality of synthesized speech signals.
It is well known that speech production and perception process is inherently bimodal consisting of audio and visual components. Recently there has been increased interest in using the visual modality in combination with the acoustic modality for improved speech processing. This field of study has gained the title of audio-visual speech processing. Lip movement recognition, also known as lip reading, is a communication skill which involves the interpretation of lip movements in order to estimate some important parameters of the lips that include, but not limited to, size, shape and orientation. In this paper, we represent a hybrid framework for lip reading which is based on both audio and visual speech parameters extracted from a video stream of isolated spoken words. The proposed algorithm is self-tuned in the sense that it starts with an estimations of speech parameters based on visual lip features and then the coefficients of the algorithm are fine-tuned based on the extracted audio parameters. In the audio speech processing part, extracted audio features are used to generate a vector containing information of the speech phonemes. These information are used later to enhance the recognition and matching process. For lip feature extraction, we use a modified version of the method used by F. Huang and T. Chen for tracking of multiple faces. This method is based on statistical color modeling and the deformable template. The experiments based on the proposed framework showed interesting results in recognition of isolated words.
The ultimate goal of text-to-speech synthesis is to convert ordinary orthographic text into an acoustic signal that is indistinguishable from human speech. Originally, synthesis systems were architected around a system of rules and models that were based on research on human language and speech production and perception processes. The quality of speech produced by such systems is inherently limited by the quality of the rules and the models. Given that our knowledge of human speech processes is still incomplete, the quality of text-to-speech is far from natural-sounding. Hence, today's interest in high quality speech for applications, in combination with advances in computer resource, has caused the focus to shift from rules and model-based methods to corpus-based methods that presumably bypass rules and models. For example, many systems now rely on large word pronunciation dictionaries instead of letter-to-phoneme rules and large prerecorded sound inventories instead of rules predicting the acoustic correlates of phonemes. Because of the need to analyze large amounts of data, this approach relies on automated techniques such as those used in automatic speech recognition.
This paper provides a brief review of the literature on speech production with an emphasis on auditory perturbation studies and addresses the implications for speech production and perception in the hearing impaired individuals, particularly cochlear implant recipients. We advocate the use of auditory perturbation paradigms on the population with assistive hearing devices.
Recent advances in automatic speech recognition have used large corpora and powerful computational resources to train complex statistical models from high-dimensional features, to attempt to capture all the variability found in natural speech. Such models are difficult to interpret and may be fragile, and contradict or ignore knowledge of human speech production and perception. We report progress towards phoneme recognition using a model of speech which employs very few parameters and which is more faithful to the dynamics and model of human speech production. Using features generated from a neural network bottleneck layer, we obtain recognition accuracy on TIMIT which compares favourably with traditional models of similar power. We discuss the implications of these results for recognition using natural features such as vocal tract resonances and spectral energies.
No standards are currently tagged "Speech Production And Perception"