IEEE Organizations related to IEEE Transactions on Audio, Speech, and Language Processing

Back to Top


Conferences related to IEEE Transactions on Audio, Speech, and Language Processing

Back to Top

No conferences are currently tagged "IEEE Transactions on Audio, Speech, and Language Processing"


Periodicals related to IEEE Transactions on Audio, Speech, and Language Processing

Back to Top

Audio, Speech, and Language Processing, IEEE Transactions on

Speech analysis, synthesis, coding speech recognition, speaker recognition, language modeling, speech production and perception, speech enhancement. In audio, transducers, room acoustics, active sound control, human audition, analysis/synthesis/coding of music, and consumer audio. (8) (IEEE Guide for Authors) The scope for the proposed transactions includes SPEECH PROCESSING - Transmission and storage of Speech signals; speech coding; speech enhancement and noise reduction; ...


Broadcasting, IEEE Transactions on

Broadcast technology, including devices, equipment, techniques, and systems related to broadcast technology, including the production, distribution, transmission, and propagation aspects.


Circuits and Systems II: Express Briefs, IEEE Transactions on

Part I will now contain regular papers focusing on all matters related to fundamental theory, applications, analog and digital signal processing. Part II will report on the latest significant results across all of these topic areas.


Computational Biology and Bioinformatics, IEEE/ACM Transactions on

Specific topics of interest include, but are not limited to, sequence analysis, comparison and alignment methods; motif, gene and signal recognition; molecular evolution; phylogenetics and phylogenomics; determination or prediction of the structure of RNA and Protein in two and three dimensions; DNA twisting and folding; gene expression and gene regulatory networks; deduction of metabolic pathways; micro-array design and analysis; proteomics; ...


Consumer Electronics, IEEE Transactions on

The design and manufacture of consumer electronics products, components, and related activities, particularly those used for entertainment, leisure, and educational purposes


More Periodicals

Most published Xplore authors for IEEE Transactions on Audio, Speech, and Language Processing

Back to Top

Xplore Articles related to IEEE Transactions on Audio, Speech, and Language Processing

Back to Top

Hierarchical Bayesian Language Models for Conversational Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing, 2010

Traditional <i>n</i> -gram language models are widely used in state-of-the-art large vocabulary speech recognition systems. This simple model suffers from some limitations, such as overfitting of maximum-likelihood estimation and the lack of rich contextual knowledge sources. In this paper, we exploit a hierarchical Bayesian interpretation for language modeling, based on a nonparametric prior called Pitman-Yor process. This offers a principled ...


Fixed-Point Implementation of Cascaded Forward–Backward Adaptive Predictors

IEEE Transactions on Audio, Speech, and Language Processing, 2012

Adaptive least mean square (LMS) predictors with independently low-order cascaded structures, such as the cascaded forward LMS (CFLMS) and cascaded forward-backward LMS (CFBLMS), have proven effective in combating the misadjustment and eigenvalue spread effects of linear predictors. Further developing this cascade structure, we study the fixed-point implementation of CFBLMS with applications to speech signals. Moreover, two groups of predictors with ...


New insights into the noise reduction Wiener filter

IEEE Transactions on Audio, Speech, and Language Processing, 2006

The problem of noise reduction has attracted a considerable amount of research attention over the past several decades. Among the numerous techniques that were developed, the optimal Wiener filter can be considered as one of the most fundamental noise reduction approaches, which has been delineated in different forms and adopted in various applications. Although it is not a secret that ...


Joint Dereverberation and Residual Echo Suppression of Speech Signals in Noisy Environments

IEEE Transactions on Audio, Speech, and Language Processing, 2008

Hands-free devices are often used in a noisy and reverberant environment. Therefore, the received microphone signal does not only contain the desired near-end speech signal but also interferences such as room reverberation that is caused by the near-end source, background noise and a far-end echo signal that results from the acoustic coupling between the loudspeaker and the microphone. These interferences ...


Speech Analysis and Synthesis Based on Dynamic Modes

IEEE Transactions on Audio, Speech, and Language Processing, 2011

In this paper, the source-filter model of speech production is adapted to represent the speech signal as the superposition and convolution of a dynamic source and resonant modes. The aim is to increase the resolution of the time- instantaneous-frequency representation of each of the individual contributions of different sections of the human phonatory system. We present a framework based on ...


More Xplore Articles

Educational Resources on IEEE Transactions on Audio, Speech, and Language Processing

Back to Top

IEEE-USA E-Books

  • Hierarchical Bayesian Language Models for Conversational Speech Recognition

    Traditional <i>n</i> -gram language models are widely used in state-of-the-art large vocabulary speech recognition systems. This simple model suffers from some limitations, such as overfitting of maximum-likelihood estimation and the lack of rich contextual knowledge sources. In this paper, we exploit a hierarchical Bayesian interpretation for language modeling, based on a nonparametric prior called Pitman-Yor process. This offers a principled approach to language model smoothing, embedding the power-law distribution for natural language. Experiments on the recognition of conversational speech in multiparty meetings demonstrate that by using hierarchical Bayesian language models, we are able to achieve significant reductions in perplexity and word error rate.

  • Fixed-Point Implementation of Cascaded Forward–Backward Adaptive Predictors

    Adaptive least mean square (LMS) predictors with independently low-order cascaded structures, such as the cascaded forward LMS (CFLMS) and cascaded forward-backward LMS (CFBLMS), have proven effective in combating the misadjustment and eigenvalue spread effects of linear predictors. Further developing this cascade structure, we study the fixed-point implementation of CFBLMS with applications to speech signals. Moreover, two groups of predictors with a total of six cases are compared. Group 1 employs the transversal structure for LMS, CFLMS, and CFBLMS algorithms. Group 2 employs the lattice structure for LMS, CFLMS, and CFBLMS algorithms. Experimental results show that, in group 1, the performance degradation of CFBLMS and CFLMS predictors becomes significant when the number of bits is reduced to 8, while that of the LMS predictor becomes significant when the number of bits is reduced to 9. On the other hand, in group 2, the performance degradation of CFBLMS and CFLMS predictors becomes significant when the number of bits is reduced to 5, while that of the LMS predictor becomes significant when the number of bits is reduced to 6. In both groups, the performances of CFBLMS and CFLMS are significantly superior to that of LMS, and CFBLMS is superior to CFLMS, in terms of the rate of convergence, misadjustment, and mean-square error (MSE).

  • New insights into the noise reduction Wiener filter

    The problem of noise reduction has attracted a considerable amount of research attention over the past several decades. Among the numerous techniques that were developed, the optimal Wiener filter can be considered as one of the most fundamental noise reduction approaches, which has been delineated in different forms and adopted in various applications. Although it is not a secret that the Wiener filter may cause some detrimental effects to the speech signal (appreciable or even significant degradation in quality or intelligibility), few efforts have been reported to show the inherent relationship between noise reduction and speech distortion. By defining a speech-distortion index to measure the degree to which the speech signal is deformed and two noise- reduction factors to quantify the amount of noise being attenuated, this paper studies the quantitative performance behavior of the Wiener filter in the context of noise reduction. We show that in the single-channel case the a posteriori signal-to-noise ratio (SNR) (defined after the Wiener filter) is greater than or equal to the a priori SNR (defined before the Wiener filter), indicating that the Wiener filter is always able to achieve noise reduction. However, the amount of noise reduction is in general proportional to the amount of speech degradation. This may seem discouraging as we always expect an algorithm to have maximal noise reduction without much speech distortion. Fortunately, we show that speech distortion can be better managed in three different ways. If we have some a priori knowledge (such as the linear prediction coefficients) of the clean speech signal, this a priori knowledge can be exploited to achieve noise reduction while maintaining a low level of speech distortion. When no a priori knowledge is available, we can still achieve a better control of noise reduction and speech distortion by properly manipulating the Wiener filter, resulting in a suboptimal Wiener filter. In case that we have multiple microphone sensors, the multiple observations of the speech signal can be used to reduce noise with less or even no speech distortion

  • Joint Dereverberation and Residual Echo Suppression of Speech Signals in Noisy Environments

    Hands-free devices are often used in a noisy and reverberant environment. Therefore, the received microphone signal does not only contain the desired near-end speech signal but also interferences such as room reverberation that is caused by the near-end source, background noise and a far-end echo signal that results from the acoustic coupling between the loudspeaker and the microphone. These interferences degrade the fidelity and intelligibility of near-end speech. In the last two decades, post filters have been developed that can be used in conjunction with a single microphone acoustic echo canceller to enhance the near-end speech. In previous works, spectral enhancement techniques have been used to suppress residual echo and background noise for single microphone acoustic echo cancellers. However, dereverberation of the near-end speech was not addressed in this context. Recently, practically feasible spectral enhancement techniques to suppress reverberation have emerged. In this paper, we derive a novel spectral variance estimator for the late reverberation of the near-end speech. Residual echo will be present at the output of the acoustic echo canceller when the acoustic echo path cannot be completely modeled by the adaptive filter. A spectral variance estimator for the so-called late residual echo that results from the deficient length of the adaptive filter is derived. Both estimators are based on a statistical reverberation model. The model parameters depend on the reverberation time of the room, which can be obtained using the estimated acoustic echo path. A novel postfilter is developed which suppresses late reverberation of the near-end speech, residual echo and background noise, and maintains a constant residual background noise level. Experimental results demonstrate the beneficial use of the developed system for reducing reverberation, residual echo, and background noise.

  • Speech Analysis and Synthesis Based on Dynamic Modes

    In this paper, the source-filter model of speech production is adapted to represent the speech signal as the superposition and convolution of a dynamic source and resonant modes. The aim is to increase the resolution of the time- instantaneous-frequency representation of each of the individual contributions of different sections of the human phonatory system. We present a framework based on dynamic mode predictors and filters, which are adapted, using gradient-based techniques, to track the modal dynamics of speech yielding a representation which is free from quasi-stationary assumptions thus allowing flexible manipulation of the speech signal. Several examples are offered including intonation modifications to illustrate the potential of the proposed approach.

  • Fast Algorithm for Calculation of the Union-Based Probability

    The probabilistic union model is a way for robust combination of multiple features with partial feature corruption. However, the calculation of the union-based probability is combinatorial with the number of features. This correspondence presents a fast algorithm for calculation of the union-based probability, which significantly reduces the computational requirements

  • Time-Varying Autoregressions in Speech: Detection Theory and Applications

    This paper develops a general detection theory for speech analysis based on time-varying autoregressive models, which themselves generalize the classical linear predictive speech analysis framework. This theory leads to a computationally efficient decision-theoretic procedure that may be applied to detect the presence of vocal tract variation in speech waveform data. A corresponding generalized likelihood ratio test is derived and studied both empirically for short data records, using formant-like synthetic examples, and asymptotically, leading to constant false alarm rate hypothesis tests for changes in vocal tract configuration. Two in-depth case studies then serve to illustrate the practical efficacy of this procedure across different time scales of speech dynamics: first, the detection of formant changes on the scale of tens of milliseconds of data, and second, the identification of glottal opening and closing instants on time scales below ten milliseconds.

  • Convergence Analysis of Narrowband Active Noise Equalizer System Under Imperfect Secondary Path Estimation

    Active noise equalizer systems are used to adjust the noise level in an environment, based on the preference of retaining noise information. Several researches have been carried out to determine the maximum step size bond of narrowband active noise control system with perfect secondary path estimation without gain factor consideration. However, in practical environment, secondary path estimation error of the system exists. In this paper, a stochastic approach analysis is applied to determine the maximum step size of the system under imperfect secondary path estimation. Simulation results are conducted to verify the analysis. Results show that the gain factor, sampling frequency, and secondary path estimation errors are all major factors governing the maximum step size of the narrowband active noise equalizer system under imperfect secondary path estimation.

  • Joint Detection and Tracking of Time-Varying Harmonic Components: A Flexible Bayesian Approach

    This paper addresses the joint estimation and detection of time-varying harmonic components in audio signals. We follow a flexible viewpoint, where several frequency/amplitude trajectories are tracked in spectrogram using particle filtering. The core idea is that each harmonic component (composed of a fundamental partial together with several overtone partials) is considered a target. Tracking requires to define a state-space model with state transition and measurement equations. Particle filtering algorithms rely on a so-called sequential importance distribution, and we show that it can be built on previous multipitch estimation algorithms, so as to yield an even more efficient estimation procedure with established convergence properties. Moreover, as our model captures all the harmonic model information, it actually separates the harmonic sources. Simulations on synthetic and real music data show the interest of our approach

  • On the Recognition of Cochlear Implant-Like Spectrally Reduced Speech With MFCC and HMM-Based ASR

    This correspondence investigates the recognition of cochlear implant-like spectrally reduced speech (SRS) using mel frequency cepstral coefficient (MFCC) and hidden Markov model (HMM)-based automatic speech recognition (ASR). The SRS was synthesized from subband temporal envelopes extracted from original clean test speech, whereas the acoustic models were trained on a different set of original clean speech signals of the same speech database. It was shown that changing the bandwidth of the subband temporal envelopes had no significant effect on the ASR word accuracy. In addition, increasing the number of frequency subbands of the SRS from 4 to 16 improved significantly the system performance. Furthermore, the ASR word accuracy attained with the original clean speech can be achieved by using the 16-, 24-, or 32-subband SRS. The experiments were carried out by using the TI-digits speech database and the HTK speech recognition toolkit.



Standards related to IEEE Transactions on Audio, Speech, and Language Processing

Back to Top

(Replaced) IEEE Standard VHDL Language Reference Manual

his standard revises and enhances the VHDL language reference manual (LRM) by including a standard C language interface specification; specifications from previously separate, but related, standards IEEE Std 1164 -1993,1 IEEE Std 1076.2 -1996, and IEEE Std 1076.3-1997; and general language enhancements in the areas of design and verification of electronic systems.



Jobs related to IEEE Transactions on Audio, Speech, and Language Processing

Back to Top