Conferences related to Speech Recognition

Back to Top

2013 7th Conference on Speech Technology and Human - Computer Dialogue (SpeD 2013)

“SpeD 2013” will bring together academics and industry professionals from universities, government agencies and companies to present their achievements in speech technology and related fields. “SpeD 2013” is a conference and international forum which will reflect some of the latest tendencies in spoken language technology and human-computer dialogue research as well as some of the most recent applications in this area.

  • 2011 6th Conference on Speech Technology and Human - Computer Dialogue (SpeD 2011)

    SpeD 2011 will bring together academics and industry professionals from universities, government agencies and companies to present their achievements and the latest tendencies in spoken language technology and human-computer dialogue research as well as some of the most recent applications in this area.

  • 2009 5th Conference on Speech Technology and Human - Computer Dialogue (SpeD 2009)

    The 5th Conference on Speech Technology and Human-Computer Dialogue (at Constanta, Romania) brings together academics and industry professionals from universities, government agencies and companies to present their achievements in speech technology and related fields. SpeD 2009 is a conference and international forum which will reflect some of the latest tendencies in spoken language technology and human-computer dialogue research as well as some of the most recent applications in this area.


2013 Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII)

The conference will address, but is not limited to, the following topics:• Computational and psychological models of emotion;• Affect in arts entertainment and multimedia;• Bodily manifestations of affect (facial expressions, posture, behavior, physiology);• Databases for emotion processing, development and issues;• Affective interfaces and applications (games, learning, dialogue systems…);• Ecological and continuous emotion assessment;• Affect in social interactions.

  • 2009 3rd International Conference on Affective Computing and Intelligent Interaction (ACII 2009)

    The conference series on Affective Computing and Intelligent Interaction is the premier international forum for state of the art in research on affective and multi modal human-machine interaction and systems. Every other year the ACII conference plays an important role in shaping related scientific, academic, and higher education programs. This year, we are especially soliciting papers discussing Enabling Behavioral and Socially-Aware Human-Machine Interfaces in areas including psychology.


2013 IEEE International Conference on Multimedia and Expo (ICME)

To promote the exchange of the latest advances in multimedia technologies, systems, and applications from both the research and development perspectives of the circuits and systems, communications, computer, and signal processing communities.

  • 2012 IEEE International Conference on Multimedia and Expo (ICME)

    IEEE International Conference on Multimedia & Expo (ICME) has been the flagship multimedia conference sponsored by four IEEE Societies. It exchanges the latest advances in multimedia technologies, systems, and applications from both the research and development perspectives of the circuits and systems, communications, computer, and signal processing communities.

  • 2011 IEEE International Conference on Multimedia and Expo (ICME)

    Speech, audio, image, video, text processing Signal processing for media integration 3D visualization, animation and virtual reality Multi-modal multimedia computing systems and human-machine interaction Multimedia communications and networking Multimedia security and privacy Multimedia databases and digital libraries Multimedia applications and services Media content analysis and search Hardware and software for multimedia systems Multimedia standards and related issues Multimedia qu

  • 2010 IEEE International Conference on Multimedia and Expo (ICME)

    A flagship multimedia conference sponsored by four IEEE societies, ICME serves as a forum to promote the exchange of the latest advances in multimedia technologies, systems, and applications from both the research and development perspectives of the circuits and systems, communications, computer, and signal processing communities.

  • 2009 IEEE International Conference on Multimedia and Expo (ICME)

    IEEE International Conference on Multimedia & Expo is a major annual international conference with the objective of bringing together researchers, developers, and practitioners from academia and industry working in all areas of multimedia. ICME serves as a forum for the dissemination of state-of-the-art research, development, and implementations of multimedia systems, technologies and applications.


2013 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)

The ASRU workshop meets every two years and has a tradition of bringing together researchers from academia and industry in an intimate and collegial setting to discuss problems of common interest in automatic speech recognition and understanding.


2013 International Carnahan Conference on Security Technology (ICCST)

This international conference is a forum for all aspects of physical, cyber and electronic security research, development, systems engineering, testing, evaluation, operations and sustainability. The ICCST facilitates the exchange of ideas and information.

  • 2012 IEEE International Carnahan Conference on Security Technology (ICCST)

    Research, development, and user aspects of security technology, including principles of operation, applications, and user experiences.

  • 2011 International Carnahan Conference on Security Technology (ICCST)

    This annual conference is the world s longest -running, international technical symposium on security technology. This conference is a forum for collaboration on all aspects of physical, cyber and electronic security research, development, systems engineering, testing, evaluation, operations and sustainment. The ICCST facilitates the exchange of ideas and sharing of information on both new and existing technology and systems. Conference participants are encouraged to consider the impact of their work on society. The ICCST provides a foundation for support to authorities and agencies responsible for security, safety and law enforcement in the use of available and future technology.

  • 2010 IEEE International Carnahan Conference on Security Technology (ICCST)

    The ICCST is a forum for researchers and practitioners in both new and existing security technology, providing an interchange of knowledge through paper presentations and publication of proceedings that have been selected by the ICCST organizing committee.

  • 2009 International Carnahan Conference on Security Technology (ICCST)

    Conference is directed toward research and development and user aspects of electronic security technology.

  • 2008 International Carnahan Conference on Security Technology (ICCST)

    The ICCST is directed toward the research and development aspects of electronic security technology, including the operational testing of the technology. It establishes a forum for the exchange of ideas and dissemination of information on both new and existing technology. Conference participants are stimulated to consider the impact of their work on society. The Conference is an interchange of knowledge through the presentation of learned papers that have been selected by the ICCST organizing committee.

  • 2007 IEEE International Carnahan Conference on Security Technology (ICCST)

  • 2006 IEEE International Carnahan Conference on Security Technology (ICCST)


More Conferences

Periodicals related to Speech Recognition

Back to Top

Audio, Speech, and Language Processing, IEEE Transactions on

Speech analysis, synthesis, coding speech recognition, speaker recognition, language modeling, speech production and perception, speech enhancement. In audio, transducers, room acoustics, active sound control, human audition, analysis/synthesis/coding of music, and consumer audio. (8) (IEEE Guide for Authors) The scope for the proposed transactions includes SPEECH PROCESSING - Transmission and storage of Speech signals; speech coding; speech enhancement and noise reduction; ...


Pattern Analysis and Machine Intelligence, IEEE Transactions on

Statistical and structural pattern recognition; image analysis; computational models of vision; computer vision systems; enhancement, restoration, segmentation, feature extraction, shape and texture analysis; applications of pattern analysis in medicine, industry, government, and the arts and sciences; artificial intelligence, knowledge representation, logical and probabilistic inference, learning, speech recognition, character and text recognition, syntactic and semantic processing, understanding natural language, expert systems, ...


Selected Areas in Communications, IEEE Journal on

All telecommunications, including telephone, telegraphy, facsimile, and point-to-point television, by electromagnetic propagation, including radio; wire; aerial, underground, coaxial, and submarine cables; waveguides, communication satellites, and lasers; in marine, aeronautical, space, and fixed station services; repeaters, radio relaying, signal storage, and regeneration; telecommunication error detection and correction; multiplexing and carrier techniques; communication switching systems; data communications; communication theory; and wireless communications.


Systems, Man and Cybernetics, Part A, IEEE Transactions on

Systems engineering, including efforts that involve issue formnaulations, issue analysis and modeling, and decision making and issue interpretation at any of the life-cycle phases associated with the definition, development, and implementation of large systems. It will also include efforts that relate to systems management, systems engineering processes and a variety of systems engineering methods such as optimization, modeling and simulation. ...



Most published Xplore authors for Speech Recognition

Back to Top

Xplore Articles related to Speech Recognition

Back to Top

Using hidden Markov models to define linguistic units

R. Nag; S. Austin; F. Fallside Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '86., 1986

There has been much work in using Hidden Markov Models to model different types of linguistically defined units such as words, syllables and phonetic- type units. Here we look at the problem from the other direction and try to use the states obtained from a Markov model to find our own linguistic units. We look at the problem at two ...


Definition and evaluation of phonetic units for speech recognition by hidden Markov models

M. Cravero; R. Pieraccini; F. Raineri Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '86., 1986

This paper describes the design of a phonetic unit set for recognition of continuous speech where each unit is represented by an Hidden Markov Model. Starting from a unit set definition like classical diphones, many variations were made in order to have an improvement in recognition performance and a reduction in storage requirements. The definition of this unit set is ...


Automatic assessment of putonghua articulation and pronunciation disorder

Boyu Si; Zhaoming Huang 2015 International Symposium on Bioelectronics and Bioinformatics (ISBB), 2015

In this paper, an automatic assessment system has been developed for Putonghua articulation and pronunciation assessment. The framework of developing the system consists of designing the vocabularies respectively for the two assessment parts with standard scoring regulations, constructing the speech recognition module which makes use of a HMM based acoustic model with 39 parameters including logarithmic energy, the previous 12 ...


Static and Dynamic Variance Compensation for Recognition of Reverberant Speech With Dereverberation Preprocessing

Marc Delcroix; Tomohiro Nakatani; Shinji Watanabe IEEE Transactions on Audio, Speech, and Language Processing, 2009

The performance of automatic speech recognition is severely degraded in the presence of noise or reverberation. Much research has been undertaken on noise robustness. In contrast, the problem of the recognition of reverberant speech has received far less attention and remains very challenging. In this paper, we use a dereverberation method to reduce reverberation prior to recognition. Such a preprocessor ...


On the determination of speech boundaries: A tool for providing anchor time points in speech recognition

J. Pardo Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '86., 1986

The objective of the paper is to describe an algorithm for finding robust speech boundaries which can be used as a first step in an acoustic-phonetic- based speech recognition scheme. This algorithm uses information about the speech processing made by the peripheral auditory system. The algorithm has been tested with success with a set of monosyllabic CVC words in English ...


More Xplore Articles

Educational Resources on Speech Recognition

Back to Top

eLearning

Using hidden Markov models to define linguistic units

R. Nag; S. Austin; F. Fallside Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '86., 1986

There has been much work in using Hidden Markov Models to model different types of linguistically defined units such as words, syllables and phonetic- type units. Here we look at the problem from the other direction and try to use the states obtained from a Markov model to find our own linguistic units. We look at the problem at two ...


Definition and evaluation of phonetic units for speech recognition by hidden Markov models

M. Cravero; R. Pieraccini; F. Raineri Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '86., 1986

This paper describes the design of a phonetic unit set for recognition of continuous speech where each unit is represented by an Hidden Markov Model. Starting from a unit set definition like classical diphones, many variations were made in order to have an improvement in recognition performance and a reduction in storage requirements. The definition of this unit set is ...


Automatic assessment of putonghua articulation and pronunciation disorder

Boyu Si; Zhaoming Huang 2015 International Symposium on Bioelectronics and Bioinformatics (ISBB), 2015

In this paper, an automatic assessment system has been developed for Putonghua articulation and pronunciation assessment. The framework of developing the system consists of designing the vocabularies respectively for the two assessment parts with standard scoring regulations, constructing the speech recognition module which makes use of a HMM based acoustic model with 39 parameters including logarithmic energy, the previous 12 ...


Static and Dynamic Variance Compensation for Recognition of Reverberant Speech With Dereverberation Preprocessing

Marc Delcroix; Tomohiro Nakatani; Shinji Watanabe IEEE Transactions on Audio, Speech, and Language Processing, 2009

The performance of automatic speech recognition is severely degraded in the presence of noise or reverberation. Much research has been undertaken on noise robustness. In contrast, the problem of the recognition of reverberant speech has received far less attention and remains very challenging. In this paper, we use a dereverberation method to reduce reverberation prior to recognition. Such a preprocessor ...


On the determination of speech boundaries: A tool for providing anchor time points in speech recognition

J. Pardo Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '86., 1986

The objective of the paper is to describe an algorithm for finding robust speech boundaries which can be used as a first step in an acoustic-phonetic- based speech recognition scheme. This algorithm uses information about the speech processing made by the peripheral auditory system. The algorithm has been tested with success with a set of monosyllabic CVC words in English ...


More eLearning Resources

IEEE-USA E-Books

  • Contributors

    How can we engineer systems capable of "cocktail party" listening?Human listeners are able to perceptually segregate one sound source from an acoustic mixture, such as a single voice from a mixture of other voices and music at a busy cocktail party. How can we engineer "machine listening" systems that achieve this perceptual feat?Albert Bregman's book Auditory Scene Analysis, published in 1990, drew an analogy between the perception of auditory scenes and visual scenes, and described a coherent framework for understanding the perceptual organization of sound. His account has stimulated much interest in computational studies of hearing. Such studies are motivated in part by the demand for practical sound separation systems, which have many applications including noise-robust automatic speech recognition, hearing prostheses, and automatic music transcription. This emerging field has become known as computational auditory scene analysis (CASA).Computational Auditory Scene Analysis: Principles, Algorithms, and Applications provides a comprehensive and coherent account of the state of the art in CASA, in terms of the underlying principles, the algorithms and system architectures that are employed, and the potential applications of this exciting new technology. With a Foreword by Bregman, its chapters are written by leading researchers and cover a wide range of topics including: Estimation of multiple fundamental frequencies Feature-based and model-based approaches to CASA Sound separation based on spatial location Processing for reverberant environments Segregation of speech and musical signals Automatic speech recognition in noisy environments Neural and perceptual modeling of auditory organizationThe text is written at a level that will be a ccessible to graduate students and researchers from related science and engineering disciplines. The extensive bibliography accompanying each chapter will also make this book a valuable reference source. A web site accompanying the text (www.casabook.org) features software tools and sound demonstrations.

  • Acronyms

    How can we engineer systems capable of "cocktail party" listening?Human listeners are able to perceptually segregate one sound source from an acoustic mixture, such as a single voice from a mixture of other voices and music at a busy cocktail party. How can we engineer "machine listening" systems that achieve this perceptual feat?Albert Bregman's book Auditory Scene Analysis, published in 1990, drew an analogy between the perception of auditory scenes and visual scenes, and described a coherent framework for understanding the perceptual organization of sound. His account has stimulated much interest in computational studies of hearing. Such studies are motivated in part by the demand for practical sound separation systems, which have many applications including noise-robust automatic speech recognition, hearing prostheses, and automatic music transcription. This emerging field has become known as computational auditory scene analysis (CASA).Computational Auditory Scene Analysis: Principles, Algorithms, and Applications provides a comprehensive and coherent account of the state of the art in CASA, in terms of the underlying principles, the algorithms and system architectures that are employed, and the potential applications of this exciting new technology. With a Foreword by Bregman, its chapters are written by leading researchers and cover a wide range of topics including: Estimation of multiple fundamental frequencies Feature-based and model-based approaches to CASA Sound separation based on spatial location Processing for reverberant environments Segregation of speech and musical signals Automatic speech recognition in noisy environments Neural and perceptual modeling of auditory organizationThe text is written at a level that will be a ccessible to graduate students and researchers from related science and engineering disciplines. The extensive bibliography accompanying each chapter will also make this book a valuable reference source. A web site accompanying the text (www.casabook.org) features software tools and sound demonstrations.

  • Language Modeling

    This chapter contains sections titled: Introduction Formal Tools for Linguistic Processing HMMs, Finite State Automata, and Regular Grammars A ¿Bottom-Up¿ Parsing Example Principles of ¿Top-Down¿ Recognizers Other Language Models IWR As ¿CSR¿ Standard Databases for Speech Recognition Research A Survey of Language-Model-Based Systems

  • No title

    This book introduces the theory, algorithms, and implementation techniques for efficient decoding in speech recognition mainly focusing on the Weighted Finite-State Transducer (WFST) approach. The decoding process for speech recognition is viewed as a search problem whose goal is to find a sequence of words that best matches an input speech signal. Since this process becomes computationally more expensive as the system vocabulary size increases, research has long been devoted to reducing the computational cost. Recently, the WFST approach has become an important state-of-the-art speech recognition technology, because it offers improved decoding speed with fewer recognition errors compared with conventional methods. However, it is not easy to understand all the algorithms used in this framework, and they are still in a black box for many people. In this book, we review the WFST approach and aim to provide comprehensive interpretations of WFST operations and decoding algorithms to help nyone who wants to understand, develop, and study WFST- based speech recognizers. We also mention recent advances in this framework and its applications to spoken language processing. Table of Contents: Introduction / Brief Overview of Speech Recognition / Introduction to Weighted Finite-State Transducers / Speech Recognition by Weighted Finite-State Transducers / Dynamic Decoders with On-the-fly WFST Operations / Summary and Perspective

  • Intelligent Speech/Audio Processing for Multimedia Applications

    Intelligent speech and audio processing can provide efficient and smart interfaces for various multimedia applications. Generally, speech is the most natural form of human communication. Audio and music can enhance our emotional impacts and promote interest in multimedia applications. A successful interactive multimedia system must have the capabilities of speech and audio compression, text-to-speech conversion, speech understanding, and music synthesis. The main purpose of speech and audio compression is to provide cost-effective storage or to minimize transmission costs. Text-to-speech converts linguistic information stored as data or text into speech for the applications of talking terminals, alarm systems, and audiotext services. Speech understanding systems make it possible for people to interact with computers using human speech. Its success relies on the integration of a wide variety of speech technologies, including acoustic, lexical, syntactic, semantic, and pragmatic analyses. The applications of music processing for multimedia were mostly realized by means of the combination of music, graphics, video, and other media. Since musical sounds and compositions can be precisely specified and controlled by a computer, we can easily create artificial orchestras, performers, and composers. Nowadays, multimedia systems have become more sophisticated with the advances made in computer and microelectronic technologies. Many applications require efficient processing of speech and audio for interactive presentations and integration with other types of media. The application-specific hardwares are proposed to meet the high-speed, low-cost, lightweight, and low-power requirements. The design example of a speech recognition processor and system for voice-control applications is introduced. The industrial standards and commercial products of speech and audio processing ar e also summarized in this chapter.

  • Index

    How can we engineer systems capable of "cocktail party" listening?Human listeners are able to perceptually segregate one sound source from an acoustic mixture, such as a single voice from a mixture of other voices and music at a busy cocktail party. How can we engineer "machine listening" systems that achieve this perceptual feat?Albert Bregman's book Auditory Scene Analysis, published in 1990, drew an analogy between the perception of auditory scenes and visual scenes, and described a coherent framework for understanding the perceptual organization of sound. His account has stimulated much interest in computational studies of hearing. Such studies are motivated in part by the demand for practical sound separation systems, which have many applications including noise-robust automatic speech recognition, hearing prostheses, and automatic music transcription. This emerging field has become known as computational auditory scene analysis (CASA).Computational Auditory Scene Analysis: Principles, Algorithms, and Applications provides a comprehensive and coherent account of the state of the art in CASA, in terms of the underlying principles, the algorithms and system architectures that are employed, and the potential applications of this exciting new technology. With a Foreword by Bregman, its chapters are written by leading researchers and cover a wide range of topics including: Estimation of multiple fundamental frequencies Feature-based and model-based approaches to CASA Sound separation based on spatial location Processing for reverberant environments Segregation of speech and musical signals Automatic speech recognition in noisy environments Neural and perceptual modeling of auditory organizationThe text is written at a level that will be a ccessible to graduate students and researchers from related science and engineering disciplines. The extensive bibliography accompanying each chapter will also make this book a valuable reference source. A web site accompanying the text (www.casabook.org) features software tools and sound demonstrations.

  • A Novel Gaussian Sum Smoother for Approximate Inference in Switching Linear Dynamical Systems

    We introduce a method for approximate smoothed inference in a class of switching linear dynamical systems, based on a novel form of Gaussian Sum smoother. This class includes the switching Kalman Filter and the more general case of switch transitions dependent on the continuous latent state. The method improves on the standard Kim smoothing approach by dispensing with one of the key approximations, thus making fuller use of the available future information. Whilst the only central assumption required is projection to a mixture of Gaussians, we show that an additional conditional independence assumption results in a simpler but stable and accurate alternative. Unlike the alternative unstable Expectation Propagation procedure, our method consists only of a single forward and backward pass and is reminiscent of the standard smoothing 'correction' recursions in the simpler linear dynamical system. The algorithm performs well on both toy experiments and in a large scale application to noise robust speech recognition.

  • The Artificial Neural Network

    This chapter contains sections titled: Introduction The Artificial Neuron Network Principles and Paradigms Applications of ANNs in Speech Recognition Conclusions Problems

  • Deterministic Annealing for Clustering, Compression, Classification, Regression, and Speech Recognition

    The deterministic annealing approach to clustering and its extensions have demonstrated substantial performance improvement over standard supervised and unsupervised learning methods in a variety of important applications including compression, estimation, pattern recognition and classification, and statistical regression. The method offers three important features: ability to avoid many poor local optima; applicability to many different structures/architectures; and ability to minimize the right cost function even when its gradients vanish almost everywhere as in the case of the empirical classification error. It is derived within a probabilistic framework from basic information theoretic principles (e.g., maximum entropy and random coding). The application-specific cost is minimized subject to a constraint on the randomness (Shannon entropy) of the solution, which is gradually lowered. We emphasize intuition gained from analogy to statistical physics, where this is an annealing process that avoids many shallow local minima of the specified cost and, at the limit of zero ?>temperature?>, produces a non-random (hard) solution. Alternatively, the method is derived within rate-distortion theory, where the annealing process is equivalent to computation of Shannon's rate- distortion function, and the annealing temperature is inversely proportional to the slope of the curve. This provides new insights into the method and its performance, as well as new insights into rate-distortion theory itself. The basic algorithm is extended by incorporating structural constraints to allow optimization of numerous popular structures including vector quantizers, decision trees, multilayer perceptrons, radial basis functions, mixtures of experts a nd hidden Markov models. Experimental results show considerable performance gains over standard structure-specific and application-specific training methods.

  • No title

    Approximately 10% of North Americans have some communication disorder. These can be physical as in cerebral palsy and Parkinson's disease, cognitive as in Alzheimer's disease and dementia generally, or both physical and cognitive as in stroke. In fact, deteriorations in language are often the early hallmarks of broader diseases associated with older age, which is especially relevant since aging populations across many nations will result in a drastic increase in the prevalence of these types of disorders. A significant change to how healthcare is administered, brought on by these aging populations, will increase the workload of speech-language pathologists, therapists, and caregivers who are often already overloaded. Fortunately, modern speech technology, such as automatic speech recognition, has matured to the point where it can now have a profound positive impact on the lives of millions of people living with various types of disorders. This book serves as a common ground for two ommunities: clinical linguists (e.g., speech-language pathologists) and technologists (e.g., computer scientists). This book examines the neurological and physical causes of several speech disorders and their clinical effects, and demonstrates how modern technology can be used in practice to manage those effects and improve one's quality of life. This book is intended for a broad audience, from undergraduates to more senior researchers, as well as to users of these technologies and their therapists.



Standards related to Speech Recognition

Back to Top

No standards are currently tagged "Speech Recognition"


Jobs related to Speech Recognition

Back to Top