IEEE Organizations related to Voice Activity Detection

Back to Top

No organizations are currently tagged "Voice Activity Detection"



Conferences related to Voice Activity Detection

Back to Top

No conferences are currently tagged "Voice Activity Detection"


Periodicals related to Voice Activity Detection

Back to Top

No periodicals are currently tagged "Voice Activity Detection"


Most published Xplore authors for Voice Activity Detection

Back to Top

Xplore Articles related to Voice Activity Detection

Back to Top

Use of Pitch Continuity for Robust Speech Activity Detection

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018

Speech activity detection (SAD) is an important component for various speech processing applications and has been researched extensively recently. The pitch continuity, a significant characteristic of speech, however, has not successfully played a role in existing SAD methods. In this work, we propose a novel way to integrate the pitch continuity with pitch-related features. Practice is carried out through the ...


Application of Machine Learning Techniques for Hate Speech Detection in Mobile Applications

2018 International Conference on Information Technologies (InfoTech), 2018

The proliferation of data through various platforms and applications is in constant increase. The versatility of data and its omnipresence makes it very hard to detect the trustworthiness and intention of the source. This is very evident in dynamic environments such as mobile applications. As a result, designing mobile applications that will monitor, control and block any type of malintents ...


An Ensemble SVM-based Approach for Voice Activity Detection

2018 10th International Conference on Electrical and Computer Engineering (ICECE), 2018

Voice activity detection (VAD), used as the front end of speech enhancement, speech and speaker recognition algorithms, determines the overall accuracy and efficiency of the algorithms. Therefore, a VAD with low complexity and high accuracy is highly desirable for speech processing applications. In this paper, we propose a novel training method on large dataset for supervised learning-based VAD system using ...


Chinese speech recognition and task analysis of aldebaran Nao robot

2018 Chinese Control And Decision Conference (CCDC), 2018

It is one of the important goals for human-computer interaction to make the robot understand the Chinese instruction. The paper presents a scheme for Chinese instruction processing. After the user's speech instruction is obtained, the first thing to do is to conduct a voice preprocessing and Voice Activity Detection (VAD), and then the speech recognition by means of Baidu Cloud ...


Characterizing Performance of Speaker Diarization Systems on Far-Field Speech Using Standard Methods

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018

To date, the bulk of research on speaker diarization has been conducted on telephone or near-field speech. As the need for technologies capable of handling conversational speech increases, it is necessary to establish the performance of state-of-the-art systems in this domain. In this work we evaluate the performance of an ivector/PLDA-based diarization system on the AMI Meeting Corpus, comparing performance ...


More Xplore Articles

Educational Resources on Voice Activity Detection

Back to Top

IEEE-USA E-Books

  • Use of Pitch Continuity for Robust Speech Activity Detection

    Speech activity detection (SAD) is an important component for various speech processing applications and has been researched extensively recently. The pitch continuity, a significant characteristic of speech, however, has not successfully played a role in existing SAD methods. In this work, we propose a novel way to integrate the pitch continuity with pitch-related features. Practice is carried out through the Combo-SAD approach: We examine three consecutive frames and assume that they all have the same pitch as the center frame due to pitch continuity. Corresponding feature values are recomputed at the adjusted pitch location and then used in the final expression. The new combo feature is evaluated with various types of additive noise at different signal-to-noise ratios (SNR). The results show that the new feature leads to better SAD performance (with an up to 39.3% relative improvement on miss rate compared to Combo-SAD). We also introduce a novel variant of the underlying autocorrelation function and illustrate how it can improve the accuracy of pitch detection.

  • Application of Machine Learning Techniques for Hate Speech Detection in Mobile Applications

    The proliferation of data through various platforms and applications is in constant increase. The versatility of data and its omnipresence makes it very hard to detect the trustworthiness and intention of the source. This is very evident in dynamic environments such as mobile applications. As a result, designing mobile applications that will monitor, control and block any type of malintents is important. This paper makes an attempt in this direction by implementing a lightweight machine learning classification scheme for hate speech detection in Albanian Language for mobile applications. Initial testing and evaluations indicate good classifier accuracy in mobile environments where frequent and real-time training of the algorithm is required.

  • An Ensemble SVM-based Approach for Voice Activity Detection

    Voice activity detection (VAD), used as the front end of speech enhancement, speech and speaker recognition algorithms, determines the overall accuracy and efficiency of the algorithms. Therefore, a VAD with low complexity and high accuracy is highly desirable for speech processing applications. In this paper, we propose a novel training method on large dataset for supervised learning-based VAD system using support vector machine (SVM). Despite of high classification accuracy of support vector machines (SVM), trivial SVM is not suitable for classification of large data sets needed for a good VAD system because of high training complexity. To overcome this problem, a novel ensemble-based approach using SVM has been proposed in this paper. The performance of the proposed ensemble structure has been compared with a feedforward neural network (NN). Although NN performs better than single SVM- based VAD trained on a small portion of the training data, ensemble SVM gives accuracy comparable to neural network-based VAD. Ensemble SVM and NN give 88.74% and 86.28% accuracy respectively whereas the stand-alone SVM shows 57.05% accuracy on average on the test dataset.

  • Chinese speech recognition and task analysis of aldebaran Nao robot

    It is one of the important goals for human-computer interaction to make the robot understand the Chinese instruction. The paper presents a scheme for Chinese instruction processing. After the user's speech instruction is obtained, the first thing to do is to conduct a voice preprocessing and Voice Activity Detection (VAD), and then the speech recognition by means of Baidu Cloud Speech Recognition technology. For recognized sentences, Chinese segmentation is carried out by using the Chinese segmentation algorithm combining dictionary and statistics. Then the professional keyword database is utilized to extract the key words, with the aim of attaining the verbs to perform the task and the nouns related to the places and objects in the task, so that the robot can receive the task semantic analysis. The method proves to achieve high accuracy and meet the task analysis requirement of NAO robot by experiments.

  • Characterizing Performance of Speaker Diarization Systems on Far-Field Speech Using Standard Methods

    To date, the bulk of research on speaker diarization has been conducted on telephone or near-field speech. As the need for technologies capable of handling conversational speech increases, it is necessary to establish the performance of state-of-the-art systems in this domain. In this work we evaluate the performance of an ivector/PLDA-based diarization system on the AMI Meeting Corpus, comparing performance on near-field, far-field, and signal-enhanced conditions.

  • A Modified Speech Recognition Algorithm for People with Physical Disabilities

    In this paper, we present a speech recognition algorithm that is based on double threshold voice activity detection (VAD) and Mel-frequency cepstral coefficients (MFCC). Interestingly, our proposed algorithm achieved a recognition rate of 98.7%. Our proposed approach was validated through conducting comparison with a relevant approach and its recognition rate was noticeably higher.

  • The combination of spectral entropy, zero crossing rate, short time energy and linear prediction error for voice activity detection

    In this paper, an efficient classification of voice segment from the silence segment, unvoiced segment algorithm, which is both more accurate and laid-back to implement is proposed by comparing to some previous algorithms. The proposed algorithm uses spectral entropy and short time features such as zero crossing rate, short time energy, linear prediction error are used for voice activity detection (VAD). A compound parameter, D, is calculated by using all these four parameters. Dmax is calculated from all the frames of the signal. Then the value of D/Dmax is used to determine whether the frames are classified as speech and non-speech and silence frames. The threshold values have to be obtained empirically. Experimental results show that the method of this paper can detect end-points of voice signal more accurately and outperforms the conventional VAD algorithms. The method we used in this work was evaluated on TIMIT Acoustic-Phonetic Continuous Speech Corpus. This corpus is mostly used for speech recognition application and contains clean speech data and is compared with some of the most recent proposed algorithms.

  • Two-step Judgment Algorithm for Robust Voice Activity Detection Based on Deep Neural Networks

    Voice Activity Detection (VAD) is an important front-end process for speech- based applications such as automatic speech recognition (ASR) and speaker diarization. VAD attempts to identify all the segments containing speech in an audio signal. In this paper, a robust VAD system is developed based on deep neural network (DNN) fusion with Combo-SAD. DNN model is an effective supervised approach that can achieve 4% of missed detection rate (Pmiss) at a falsealarm rate (Pfa) of 5%, Combo-SAD is an unsupervised approach designed for noise robust and reported a 5% Pmiss at Pfa of 3%. Combining the advantages of both techniques, this paper attempts to design a 2-step judgment approach. Experimental results on database containing various type of audios show that the overall error rate reaches 13.50%, which indicates the proposed VAD system is robust and effective.

  • Performance analysis of hybrid model of robust automatic continuous speech recognition system

    In this work, we evaluate the performance of objective measures of noisy input of continuous speech signal through the Hybrid method using Voice Activity Detection (VAD) and Speech Enhancement Algorithm (SEA). Automatic Speech Recognition (ASR) is an important technology, which enables natural human- machine interaction, for over five decades. The objective of this work consists in working out an identification of continuous speech recognition. The methodology presented allows evaluating the process which includes a speech-to-text system using continuous word recognition with a vocabulary of ten words (digits 0 to 9). In the training period, the continuous digits are recorded using 8-bit Pulse Code Modulation (PCM) with a sampling rate of 8 KHz and save as a wave format file using sound recorder software. For a given word in the vocabulary, the system builds an Hidden Markov Model (HMM) model and trains the model during the training phase. The training steps, from VAD, Speech Enhancement to HMM model building, are performed using PC-based Matlab programs. An overall Recognition Accuracy (RA) of 72.45% is achieved from the proposed speech recognition system working under different environment condition for an uttered word.

  • An Acoustic Signal Processing Chip With 142-nW Voice Activity Detection Using Mixer-Based Sequential Frequency Scanning and Neural Network Classification

    This article presents a voice and acoustic activity detector that uses a mixer-based architecture and ultra-low-power neural network (NN)-based classifier. By sequentially scanning 4 kHz of frequency bands and down- converting to below 500 Hz, feature extraction power consumption is reduced by 4x. The NN processor employs computational sprinting, enabling 12x power reduction. The system also features inaudible acoustic signature detection for intentional remote silent wakeup of the system while re-using a subset of the same system components. The measurement results achieve 91.5%/90% speech/non- speech hit rates at 10-dB SNR with babble noise and 142-nW power consumption. Acoustic signature detection consumes 66 nW, successfully detecting a signature 10 dB below the noise level.



Standards related to Voice Activity Detection

Back to Top

No standards are currently tagged "Voice Activity Detection"


Jobs related to Voice Activity Detection

Back to Top