Speech

View this topic in
Speech is the vocalized form of human communication. (Wikipedia.org)






Conferences related to Speech

Back to Top

2023 Annual International Conference of the IEEE Engineering in Medicine & Biology Conference (EMBC)

The conference program will consist of plenary lectures, symposia, workshops and invitedsessions of the latest significant findings and developments in all the major fields of biomedical engineering.Submitted full papers will be peer reviewed. Accepted high quality papers will be presented in oral and poster sessions,will appear in the Conference Proceedings and will be indexed in PubMed/MEDLINE.


ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The ICASSP meeting is the world's largest and most comprehensive technical conference focused on signal processing and its applications. The conference will feature world-class speakers, tutorials, exhibits, and over 50 lecture and poster sessions.


2019 IEEE International Professional Communication Conference (ProComm)

The scope of the conference includes the study, development, improvement, and promotion ofeffective techniques for preparing, organizing, processing, editing, collecting, conserving,teaching, and disseminating any form of technical information by and to individuals and groupsby any method of communication. It also includes technical, scientific, industrial, and otheractivities that contribute to the techniques and products used in this field.


2018 15th IEEE Annual Consumer Communications & Networking Conference (CCNC)

IEEE CCNC 2018 will present the latest developments and technical solutions in the areas of home networking, consumer networking, enabling technologies (such as middleware) and novel applications and services. The conference will include a peer-reviewed program of technical sessions, special sessions, business application sessions, tutorials, and demonstration sessions


2018 26th Signal Processing and Communications Applications Conference (SIU)

The general scope of the conference ranges from signal and image processing to telecommunication, and applications of signal processing methods in biomedical and communication problems.

  • 2017 25th Signal Processing and Communications Applications Conference (SIU)

    Signal Processing and Communication Applications (SIU) conference is the most prominent scientific meeting on signal processing in Turkey bringing together researchers working in signal processing and communication fields. Topics include but are not limited to the areas of research listed in the keywords.

  • 2016 24th Signal Processing and Communication Application Conference (SIU)

    Signal Processing Theory, Statistical Signal Processing, Nonlinear Signal Processing, Adaptive Signal Processing, Array and Multichannel Signal Processing, Signal Processing for Sensor Networks, Time-Frequency Analysis, Speech / Voice Processing and Recognition, Computer Vision, Pattern Recognition, Machine Learning for Signal Processing, Human-Machine Interaction, Brain-Computer Interaction, Signal-Image Acquisition and Generation, image Processing, video Processing, Image Printing and Presentation, Image / Video / Audio browsing and retrieval, Image / Video / Audio Watermarking, Multimedia Signal Processing, Biomedical Signal Processing and Image Processing, Bioinformatics, Biometric Signal-Image Processing and Recognition, Signal Processing for Security and Defense, Signal and Image Processing for Remote Sensing, Signal Processing Hardware, Signal Processing Education, Radar Signal Processing, Communication Theory, Communication Networks, Wireless Communications

  • 2015 23th Signal Processing and Communications Applications Conference (SIU)

    Signal Processing Theory Statistical Signal Processing Nonlinear Signal Processing Adaptive Signal Processing Array and Multichannel Signal Processing Signal Processing for Sensor Networks Time-Frequency Analysis Speech / Voice Processing and Recognition Computer Vision Pattern Recognition Machine Learning for Signal Processing Human-Machine Interaction Brain-Computer Interaction Signal-Image Acquisition and Generation image Processing video Processing Image Printing and Presentation Image / Video / Audio browsing and retrieval Image / Video / Audio Watermarking Multimedia Signal Processing Biomedical Signal Processing and Image Processing Bioinformatics Biometric Signal-Image Processing and Recognition Signal Processing for Security and Defense Signal and Image Processing for Remote Sensing Signal Processing Hardware Signal Processing Education Radar Signal Processing Communication Theory Communication Networks Wireless Communications

  • 2014 22nd Signal Processing and Communications Applications Conference (SIU)

    SIU will be held in Trabzon, Turkey at the Karadeniz Technical University Convention and Exhibition Centre on April 23, 2014. SIU is the largest and most comprehensive technical conference focused on signal processing and its applications in Turkey. Last year there were 500 hundred participants. The conference will feature renowned speakers, tutorials, and thematic workshops. Topics include but are not limited to: Signal Procesing, Image Processing, Communication, Computer Vision, Machine Learning, Biomedical Signal Processing,

  • 2013 21st Signal Processing and Communications Applications Conference (SIU)

    Conference will discuss state of the art solutions and research results on existing and future DSP and telecommunication systems, applications, and related standardization activities. Conference will also include invited lectures, tutorials and special sessions.

  • 2012 20th Signal Processing and Communications Applications Conference (SIU)

    Conference will discuss state of the art solutions and research results on existing and future DSP and telecommunication systems, applications, and related standardization activities. Conference will also include invited lectures, tutorials and special sessions.

  • 2011 19th Signal Processing and Communications Applications Conference (SIU)

    Conference will bring together academia and industry professionals as well as students and researchers to present and discuss state of the art solutions and research results on existing and future DSP and telecommunication systems, applications, and related standardization activities. The Conference will also include invited lectures, tutorials and special sessions.

  • 2010 IEEE 18th Signal Processing and Communications Applications Conference (SIU)

    S1.Theory of Signal-Processing S2.Statistical Signal-Processing S3.Multimedia Signal-Processing S4.Biomedical Signal-Processing S5.Sensor Networks S6.Multirate Signal-Processing S7.Pattern Recognition S8.Computer Vision S9.Adaptive Filters S10.Image/Video/Speech Browsing, Retrieval S11.Speech/Audio Coding S12.Speech Processing S13.Human-Machine Interfaces S14.Surveillance Signal Processing S15.Bioinformatics S16.Self-Learning S17.Signal-Processing Education S18.Signal-Processing Systems S1

  • 2009 IEEE 17th Signal Processing and Communications Applications Conference (SIU)

    The scope of the conference is to cover recent topics in theory and applications of Signal Processing and Communications.

  • 2008 IEEE 16th Signal Processing and Communications Applications Conference (SIU)

    Signal Processing, Image Processing, Speech Processing, Pattern Recognition, Human Computer Interaction, Communication, Video and Speech indexing, Computer Vision, Biomedical Signal Processing

  • 2007 IEEE 15th Signal Processing and Communications Applications (SIU)

  • 2006 IEEE 14th Signal Processing and Communications Applications (SIU)

  • 2005 IEEE 13th Signal Processing and Communications Applications (SIU)

  • 2004 IEEE 12th Signal Processing and Communications Applications (SIU)


More Conferences

Periodicals related to Speech

Back to Top

Aerospace and Electronic Systems Magazine, IEEE

The IEEE Aerospace and Electronic Systems Magazine publishes articles concerned with the various aspects of systems for space, air, ocean, or ground environments.


Audio, Speech, and Language Processing, IEEE Transactions on

Speech analysis, synthesis, coding speech recognition, speaker recognition, language modeling, speech production and perception, speech enhancement. In audio, transducers, room acoustics, active sound control, human audition, analysis/synthesis/coding of music, and consumer audio. (8) (IEEE Guide for Authors) The scope for the proposed transactions includes SPEECH PROCESSING - Transmission and storage of Speech signals; speech coding; speech enhancement and noise reduction; ...


Automatic Control, IEEE Transactions on

The theory, design and application of Control Systems. It shall encompass components, and the integration of these components, as are necessary for the construction of such systems. The word `systems' as used herein shall be interpreted to include physical, biological, organizational and other entities and combinations thereof, which can be represented through a mathematical symbolism. The Field of Interest: shall ...


Biomedical Engineering, IEEE Transactions on

Broad coverage of concepts and methods of the physical and engineering sciences applied in biology and medicine, ranging from formalized mathematical theory through experimental science and technological development to practical clinical applications.


Broadcasting, IEEE Transactions on

Broadcast technology, including devices, equipment, techniques, and systems related to broadcast technology, including the production, distribution, transmission, and propagation aspects.


More Periodicals

Most published Xplore authors for Speech

Back to Top

Xplore Articles related to Speech

Back to Top

Perceptual Information Loss due to Impaired Speech Production

IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2017

Phonological classes define articulatory-free and articulatory-bound phone attributes. Deep neural network is used to estimate the probability of phonological classes from the speech signal. In theory, a unique combination of phone attributes form a phoneme identity. Probabilistic inference of phonological classes thus enables estimation of their compositional phoneme probabilities. A novel information theoretic framework is devised to quantify the information ...


Leveraging automatic speech recognition in cochlear implants for improved speech intelligibility under reverberation

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015

Despite recent advancements in digital signal processing technology for cochlear implant (CI) devices, there still remains a significant gap between speech identification performance of CI users in reverberation compared to that in anechoic quiet conditions. Alternatively, automatic speech recognition (ASR) systems have seen significant improvements in recent years resulting in robust speech recognition in a variety of adverse environments, including ...


An analysis of machine translation and speech synthesis in speech-to-speech translation system

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011

This paper provides an analysis of the impacts of machine translation and speech synthesis on speech-to-speech translation systems. The speech-to-speech translation system consists of three components: speech recognition, machine translation and speech synthesis. Many techniques for integration of speech recognition and machine translation have been proposed. However, speech synthesis has not yet been considered. Therefore, in this paper, we focus ...


Building HMM based unit-selection speech synthesis system using synthetic speech naturalness evaluation score

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011

This paper proposes a unit-selection and waveform concatenation speech synthesis system based on synthetic speech naturalness evaluation. A Support Vector Machine (SVM) and Log Likelihood Ratio (LLR) based synthetic speech naturalness evaluation system was introduced in our previous work. In this paper, the evaluation system is improved in three aspects. Finally, a unit- selection and concatenation waveform speech synthesis system ...


A source generator based modeling framework for synthesis of speech under stress

1995 International Conference on Acoustics, Speech, and Signal Processing, 1995

The objective of this paper is to formulate an algorithm to generate stressed synthetic speech from neutral speech using a source generator framework previously employed for stressed speech recognition. The following goals are addressed (i) identify the most visible indicators of stress as perceived by the listener in stressed speaking styles such as loud, Lombard effect and angry, (ii) develop ...


More Xplore Articles

Educational Resources on Speech

Back to Top

IEEE-USA E-Books

  • Perceptual Information Loss due to Impaired Speech Production

    Phonological classes define articulatory-free and articulatory-bound phone attributes. Deep neural network is used to estimate the probability of phonological classes from the speech signal. In theory, a unique combination of phone attributes form a phoneme identity. Probabilistic inference of phonological classes thus enables estimation of their compositional phoneme probabilities. A novel information theoretic framework is devised to quantify the information conveyed by each phone attribute, and assess the speech production quality for perception of phonemes. As a use case, we hypothesize that disruption in speech production leads to information loss in phone attributes, and thus confusion in phoneme identification. We quantify the amount of information loss due to dysarthric articulation recorded in the TORGO database. A novel information measure is formulated to evaluate the deviation from an ideal phone attribute production leading us to distinguish healthy production from pathological speech.

  • Leveraging automatic speech recognition in cochlear implants for improved speech intelligibility under reverberation

    Despite recent advancements in digital signal processing technology for cochlear implant (CI) devices, there still remains a significant gap between speech identification performance of CI users in reverberation compared to that in anechoic quiet conditions. Alternatively, automatic speech recognition (ASR) systems have seen significant improvements in recent years resulting in robust speech recognition in a variety of adverse environments, including reverberation. In this study, we exploit advancements seen in ASR technology for alternative formulated solutions to benefit CI users. Specifically, an ASR system is developed using multicondition training on speech data with different reverberation characteristics (e.g., T<sub>60</sub> values), resulting in low word error rates (WER) in reverberant conditions. A speech synthesizer is then utilized to generate speech waveforms from the output of the ASR system, from which the synthesized speech is presented to CI listeners. The effectiveness of this hybrid recognition-synthesis CI strategy is evaluated under moderate to highly reverberant conditions (i.e., T<sub>60</sub> = 0.3, 0.6, 0.8, and 1.0s) using speech material extracted from the TIMIT corpus. Experimental results confirm the effectiveness of multi- condition training on performance of the ASR system in reverberation, which consequently results in substantial speech intelligibility gains for CI users in reverberant environments.

  • An analysis of machine translation and speech synthesis in speech-to-speech translation system

    This paper provides an analysis of the impacts of machine translation and speech synthesis on speech-to-speech translation systems. The speech-to-speech translation system consists of three components: speech recognition, machine translation and speech synthesis. Many techniques for integration of speech recognition and machine translation have been proposed. However, speech synthesis has not yet been considered. Therefore, in this paper, we focus on machine translation and speech synthesis, and report a subjective evaluation to analyze the impact of each component. The results of these analyses show that the naturalness and intelligibility of synthesized speech are strongly affected by the fluency of the translated sentences.

  • Building HMM based unit-selection speech synthesis system using synthetic speech naturalness evaluation score

    This paper proposes a unit-selection and waveform concatenation speech synthesis system based on synthetic speech naturalness evaluation. A Support Vector Machine (SVM) and Log Likelihood Ratio (LLR) based synthetic speech naturalness evaluation system was introduced in our previous work. In this paper, the evaluation system is improved in three aspects. Finally, a unit- selection and concatenation waveform speech synthesis system is built on the base of the synthetic speech naturalness evaluation system. Optimum unit sequence is chosen through the re-scoring for the N-best path. Subjective listening tests show the proposed synthetic speech evaluation based speech synthesis system significantly outperforms the traditional unit-selection speech synthesis system.

  • A source generator based modeling framework for synthesis of speech under stress

    The objective of this paper is to formulate an algorithm to generate stressed synthetic speech from neutral speech using a source generator framework previously employed for stressed speech recognition. The following goals are addressed (i) identify the most visible indicators of stress as perceived by the listener in stressed speaking styles such as loud, Lombard effect and angry, (ii) develop a mathematical model for representing speech production under stressed conditions, and (iii) employ the above model to produce emotional/stressed synthetic speech from neutral speech. The stress modeling scheme is applied to an existing low-bit rate CELP speech coder in order to investigate (i) the coder's ability and limitations reproducing stressed synthetic speech, and (ii) our ability to perturb coded neutral speech parameters at the synthesis stage so that resulting speech is perceived as being under stress. Two stress perturbation algorithms are proposed and evaluated. Results from formal listener evaluations show that 87% of neutral perturbed speech was indeed perceived as stressed.

  • Automatic pronunciation prediction for text-to-speech synthesis of dialectal arabic in a speech-to-speech translation system

    Text-to-speech synthesis (TTS) is the final stage in the speech-tospeech (S2S) translation pipeline, producing an audible rendition of translated text in the target language. TTS systems typically rely on a lexicon to look up pronunciations for each word in the input text. This is problematic when the target language is dialectal Arabic, because the statistical machine translation (SMT) system usually produces undiacritized text output. Many words in the latter possess multiple pronunciations; the correct choice must be inferred from context. In this paper, we present a weakly supervised pronunciation prediction approach for undiacritized dialectal Arabic in S2S systems that leverages automatic speech recognition (ASR) to obtain parallel training data for pronunciation prediction. Additionally, we show that incorporating source language features derived from SMT-generated automatic word alignment further improves automatic pronunciation prediction accuracy.

  • ASR for electro-laryngeal speech

    The electro-larynx device (EL) offers the possibility to re-obtain speech when the larynx is removed after a total laryngectomy. Speech produced with an EL suffers from inadequate speech sound quality, therefore there is a strong need to enhance EL speech. When disordered speech is applied to Automatic Speech Recognition (ASR) systems, the performance will significantly decrease. ASR systems are increasingly part of daily life and therefore, the word accuracy rate of disordered speech should be reasonably high in order to be able to make ASR technologies accessible for patients suffering from speech disorders. Moreover, ASR is a method to get an objective rating for the intelligibility of disordered speech. In this paper we apply disordered speech, namely speech produced by an EL, on an ASR system which was designed for normal, healthy speech and evaluate its performance with different types of adaptation. Furthermore, we show that two approaches to reduce the directly radiated EL (DREL) noise from the device itself are able to increase the word accuracy rate compared to the unprocessed EL speech.

  • Medium-duration modulation cepstral feature for robust speech recognition

    Studies have shown that the performance of state-of-the-art automatic speech recognition (ASR) systems significantly deteriorate with increased noise levels and channel degradations, when compared to human speech recognition capability. Traditionally, noise-robust acoustic features are deployed to improve speech recognition performance under varying background conditions to compensate for the performance degradations. In this paper, we present the Modulation of Medium Duration Speech Amplitude (MMeDuSA) feature, which is a composite feature capturing subband speech modulations and a summary modulation. We analyze MMeDuSA's speech recognition performance using SRI International's DECIPHER®large vocabulary continuous speech recognition (LVCSR) system, on noise and channel degraded Levantine Arabic speech distributed through the Defense Advance Research Projects Agency (DARPA) Robust Automatic Speech Transcription (RATS) program. We also analyzed MMeDuSA's performance against the Aurora-4 noise-and-channel degraded English corpus. Our results from all these experiments suggest that the proposed MMeDuSA feature improved recognition performance under both noisy and channel degraded conditions in almost all the recognition tasks.

  • Multi-channel speech processing architectures for noise robust speech recognition: 3rd CHiME challenge results

    Recognizing speech under noisy condition is an ill-posed problem. The CHiME 3 challenge targets robust speech recognition in realistic environments such as street, bus, caffee and pedestrian areas. We study variants of beamformers used for pre-processing multi-channel speech recordings. In particular, we investigate three variants of generalized side-lobe canceller (GSC) beamformers, i.e. GSC with sparse blocking matrix (BM), GSC with adaptive BM (ABM), and GSC with minimum variance distortionless response (MVDR) and ABM. Furthermore, we apply several post-filters to further enhance the speech signal. We introduce MaxPower postfilters and deep neural postfilters (DPFs). DPFs outperformed our baseline systems significantly when measuring the overall perceptual score (OPS) and the perceptual evaluation of speech quality (PESQ). In particular DPFs achieved an average relative improvement of 17.54% OPS points and 18.28% in PESQ, when compared to the CHiME 3 baseline. DPFs also achieved the best WER when combined with an ASR engine on simulated development and evaluation data, i.e. 8.98% and 10.82% WER. The proposed MaxPower beamformer achieved the best overall WER on CHiME 3 real development and evaluation data, i.e. 14.23% and 22.12%, respectively.

  • Speech coding by the efficient transformation of the spectral envelope of subwords

    The authors have developed a signal-dependent representation which captures, with a few KL vectors and transform coefficients, the perceptually and phonetically important structure of the spectral envelope. Together with a mixed excitation strategy with some novel features, this representation has been applied to the analysis, synthesis and coding of speech with promising results in the 5-kb/s range.<<ETX>>



Standards related to Speech

Back to Top

No standards are currently tagged "Speech"