Conferences related to Speech Recognition

Back to Top

2013 7th Conference on Speech Technology and Human - Computer Dialogue (SpeD 2013)

“SpeD 2013” will bring together academics and industry professionals from universities, government agencies and companies to present their achievements in speech technology and related fields. “SpeD 2013” is a conference and international forum which will reflect some of the latest tendencies in spoken language technology and human-computer dialogue research as well as some of the most recent applications in this area.

  • 2011 6th Conference on Speech Technology and Human - Computer Dialogue (SpeD 2011)

    SpeD 2011 will bring together academics and industry professionals from universities, government agencies and companies to present their achievements and the latest tendencies in spoken language technology and human-computer dialogue research as well as some of the most recent applications in this area.

  • 2009 5th Conference on Speech Technology and Human - Computer Dialogue (SpeD 2009)

    The 5th Conference on Speech Technology and Human-Computer Dialogue (at Constanta, Romania) brings together academics and industry professionals from universities, government agencies and companies to present their achievements in speech technology and related fields. SpeD 2009 is a conference and international forum which will reflect some of the latest tendencies in spoken language technology and human-computer dialogue research as well as some of the most recent applications in this area.


2013 Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII)

The conference will address, but is not limited to, the following topics:• Computational and psychological models of emotion;• Affect in arts entertainment and multimedia;• Bodily manifestations of affect (facial expressions, posture, behavior, physiology);• Databases for emotion processing, development and issues;• Affective interfaces and applications (games, learning, dialogue systems…);• Ecological and continuous emotion assessment;• Affect in social interactions.

  • 2009 3rd International Conference on Affective Computing and Intelligent Interaction (ACII 2009)

    The conference series on Affective Computing and Intelligent Interaction is the premier international forum for state of the art in research on affective and multi modal human-machine interaction and systems. Every other year the ACII conference plays an important role in shaping related scientific, academic, and higher education programs. This year, we are especially soliciting papers discussing Enabling Behavioral and Socially-Aware Human-Machine Interfaces in areas including psychology.


2013 IEEE International Conference on Multimedia and Expo (ICME)

To promote the exchange of the latest advances in multimedia technologies, systems, and applications from both the research and development perspectives of the circuits and systems, communications, computer, and signal processing communities.

  • 2012 IEEE International Conference on Multimedia and Expo (ICME)

    IEEE International Conference on Multimedia & Expo (ICME) has been the flagship multimedia conference sponsored by four IEEE Societies. It exchanges the latest advances in multimedia technologies, systems, and applications from both the research and development perspectives of the circuits and systems, communications, computer, and signal processing communities.

  • 2011 IEEE International Conference on Multimedia and Expo (ICME)

    Speech, audio, image, video, text processing Signal processing for media integration 3D visualization, animation and virtual reality Multi-modal multimedia computing systems and human-machine interaction Multimedia communications and networking Multimedia security and privacy Multimedia databases and digital libraries Multimedia applications and services Media content analysis and search Hardware and software for multimedia systems Multimedia standards and related issues Multimedia qu

  • 2010 IEEE International Conference on Multimedia and Expo (ICME)

    A flagship multimedia conference sponsored by four IEEE societies, ICME serves as a forum to promote the exchange of the latest advances in multimedia technologies, systems, and applications from both the research and development perspectives of the circuits and systems, communications, computer, and signal processing communities.

  • 2009 IEEE International Conference on Multimedia and Expo (ICME)

    IEEE International Conference on Multimedia & Expo is a major annual international conference with the objective of bringing together researchers, developers, and practitioners from academia and industry working in all areas of multimedia. ICME serves as a forum for the dissemination of state-of-the-art research, development, and implementations of multimedia systems, technologies and applications.


2013 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)

The ASRU workshop meets every two years and has a tradition of bringing together researchers from academia and industry in an intimate and collegial setting to discuss problems of common interest in automatic speech recognition and understanding.


2013 International Carnahan Conference on Security Technology (ICCST)

This international conference is a forum for all aspects of physical, cyber and electronic security research, development, systems engineering, testing, evaluation, operations and sustainability. The ICCST facilitates the exchange of ideas and information.

  • 2012 IEEE International Carnahan Conference on Security Technology (ICCST)

    Research, development, and user aspects of security technology, including principles of operation, applications, and user experiences.

  • 2011 International Carnahan Conference on Security Technology (ICCST)

    This annual conference is the world s longest -running, international technical symposium on security technology. This conference is a forum for collaboration on all aspects of physical, cyber and electronic security research, development, systems engineering, testing, evaluation, operations and sustainment. The ICCST facilitates the exchange of ideas and sharing of information on both new and existing technology and systems. Conference participants are encouraged to consider the impact of their work on society. The ICCST provides a foundation for support to authorities and agencies responsible for security, safety and law enforcement in the use of available and future technology.

  • 2010 IEEE International Carnahan Conference on Security Technology (ICCST)

    The ICCST is a forum for researchers and practitioners in both new and existing security technology, providing an interchange of knowledge through paper presentations and publication of proceedings that have been selected by the ICCST organizing committee.

  • 2009 International Carnahan Conference on Security Technology (ICCST)

    Conference is directed toward research and development and user aspects of electronic security technology.

  • 2008 International Carnahan Conference on Security Technology (ICCST)

    The ICCST is directed toward the research and development aspects of electronic security technology, including the operational testing of the technology. It establishes a forum for the exchange of ideas and dissemination of information on both new and existing technology. Conference participants are stimulated to consider the impact of their work on society. The Conference is an interchange of knowledge through the presentation of learned papers that have been selected by the ICCST organizing committee.

  • 2007 IEEE International Carnahan Conference on Security Technology (ICCST)

  • 2006 IEEE International Carnahan Conference on Security Technology (ICCST)


More Conferences

Periodicals related to Speech Recognition

Back to Top

Audio, Speech, and Language Processing, IEEE Transactions on

Speech analysis, synthesis, coding speech recognition, speaker recognition, language modeling, speech production and perception, speech enhancement. In audio, transducers, room acoustics, active sound control, human audition, analysis/synthesis/coding of music, and consumer audio. (8) (IEEE Guide for Authors) The scope for the proposed transactions includes SPEECH PROCESSING - Transmission and storage of Speech signals; speech coding; speech enhancement and noise reduction; ...


Pattern Analysis and Machine Intelligence, IEEE Transactions on

Statistical and structural pattern recognition; image analysis; computational models of vision; computer vision systems; enhancement, restoration, segmentation, feature extraction, shape and texture analysis; applications of pattern analysis in medicine, industry, government, and the arts and sciences; artificial intelligence, knowledge representation, logical and probabilistic inference, learning, speech recognition, character and text recognition, syntactic and semantic processing, understanding natural language, expert systems, ...


Selected Areas in Communications, IEEE Journal on

All telecommunications, including telephone, telegraphy, facsimile, and point-to-point television, by electromagnetic propagation, including radio; wire; aerial, underground, coaxial, and submarine cables; waveguides, communication satellites, and lasers; in marine, aeronautical, space, and fixed station services; repeaters, radio relaying, signal storage, and regeneration; telecommunication error detection and correction; multiplexing and carrier techniques; communication switching systems; data communications; communication theory; and wireless communications.


Systems, Man and Cybernetics, Part A, IEEE Transactions on

Systems engineering, including efforts that involve issue formnaulations, issue analysis and modeling, and decision making and issue interpretation at any of the life-cycle phases associated with the definition, development, and implementation of large systems. It will also include efforts that relate to systems management, systems engineering processes and a variety of systems engineering methods such as optimization, modeling and simulation. ...



Most published Xplore authors for Speech Recognition

Back to Top

Xplore Articles related to Speech Recognition

Back to Top

Factor analysed hidden Markov models

A-V. I. Rosti; M. J. F. Gales 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2002

This paper presents a general form of acoustic model for speech recognition. The model is based on an extension to factor analysis where the low dimensional subspace is modelled with a mixture of Gaussians hidden Markov model (HMM) and the observation noise by a Gaussian mixture model. Here the HMM output vectors are the latent variables of a general factor ...


Emotion Elicitation in a Computerized Gambling Game

Vered Aharonson; Noam Amir 2006 International Conference on Information Technology: Research and Education, 2006

We designed a novel computer controlled environment that could elicit emotions in subjects while they were uttering short identical phrases. The paradigm was based on Damasio's experiment for eliciting apprehension and was implemented by a voice activated computer game. Recordings of dozens of identical sentences were collected per subject, which were coupled to events in the game - gain or ...


"BIRON, let me show you something": evaluating the interaction with a robot companion

Shuyin Li; M. Kleinehagenbrock; J. Fritsch; B. Wrede; G. Sagerer 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583), 2004

Current research on the interaction with a robot is driven by the desire to build intuitive and natural interaction schemes. In order for our robot BIRON to behave naturally we integrated an attention system that enables the robot to search for and eventually focus on human communication partners by detecting and tracking persons. Via a natural language interface the user ...


Speaker-independent isolated word recognition using word-based vector quantization and hidden Markov models

Y. Cheung; S. Leung Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '87., 1987

In this paper, we investigate the possibility of using word-based vector quantization with hidden Markov models for speaker-independent isolated word recognition. Two word-based algorithms were proposed and studied. Experiments were carried out on Chinese (Cantonese) digits spoken by 110 speakers (55 males and 55 females) in two databases. An improvement of about 3% in recognition rate was obtained in one ...


High resolution formant extraction from linear-prediction phase spectra

N. Reddy; M. Swamy IEEE Transactions on Acoustics, Speech, and Signal Processing, 1984

A new algorithm is proposed to extract formant information from linear- prediction (LP) spectra. The problems of resolving two closely spaced formants and accurately estimating the bandwidth of a formant are particularly addressed. The proposed method detects resonant poles from the peaks of the second derivative of the log-magnitude spectrum or from the second derivative of the group delay of ...


More Xplore Articles

Educational Resources on Speech Recognition

Back to Top

eLearning

Factor analysed hidden Markov models

A-V. I. Rosti; M. J. F. Gales 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2002

This paper presents a general form of acoustic model for speech recognition. The model is based on an extension to factor analysis where the low dimensional subspace is modelled with a mixture of Gaussians hidden Markov model (HMM) and the observation noise by a Gaussian mixture model. Here the HMM output vectors are the latent variables of a general factor ...


Emotion Elicitation in a Computerized Gambling Game

Vered Aharonson; Noam Amir 2006 International Conference on Information Technology: Research and Education, 2006

We designed a novel computer controlled environment that could elicit emotions in subjects while they were uttering short identical phrases. The paradigm was based on Damasio's experiment for eliciting apprehension and was implemented by a voice activated computer game. Recordings of dozens of identical sentences were collected per subject, which were coupled to events in the game - gain or ...


"BIRON, let me show you something": evaluating the interaction with a robot companion

Shuyin Li; M. Kleinehagenbrock; J. Fritsch; B. Wrede; G. Sagerer 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583), 2004

Current research on the interaction with a robot is driven by the desire to build intuitive and natural interaction schemes. In order for our robot BIRON to behave naturally we integrated an attention system that enables the robot to search for and eventually focus on human communication partners by detecting and tracking persons. Via a natural language interface the user ...


Speaker-independent isolated word recognition using word-based vector quantization and hidden Markov models

Y. Cheung; S. Leung Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '87., 1987

In this paper, we investigate the possibility of using word-based vector quantization with hidden Markov models for speaker-independent isolated word recognition. Two word-based algorithms were proposed and studied. Experiments were carried out on Chinese (Cantonese) digits spoken by 110 speakers (55 males and 55 females) in two databases. An improvement of about 3% in recognition rate was obtained in one ...


High resolution formant extraction from linear-prediction phase spectra

N. Reddy; M. Swamy IEEE Transactions on Acoustics, Speech, and Signal Processing, 1984

A new algorithm is proposed to extract formant information from linear- prediction (LP) spectra. The problems of resolving two closely spaced formants and accurately estimating the bandwidth of a formant are particularly addressed. The proposed method detects resonant poles from the peaks of the second derivative of the log-magnitude spectrum or from the second derivative of the group delay of ...


More eLearning Resources

IEEE.tv Videos

No IEEE.tv Videos are currently tagged "Speech Recognition"

IEEE-USA E-Books

  • What a Beautiful Voice

    This chapter contains sections titled: That's an Ugly Voice, It Must Be a Computer Talking A Multimedia Communication Hey, What are You Talking About? (Computerized) Talk Is Not Cheap Speech Synthesis Speech Recognition What Did You Say? Neural and Fuzzy Approaches It's All a Matter of Semantics

  • Conclusions

    As the power of computing has grown over the past few decades, the field of machine learning has advanced rapidly in both theory and practice. Machine learning methods are usually based on the assumption that the data generation mechanism does not change over time. Yet real-world applications of machine learning, including image recognition, natural language processing, speech recognition, robot control, and bioinformatics, often violate this common assumption. Dealing with non-stationarity is one of modern machine learning's greatest challenges. This book focuses on a specific non-stationary environment known as covariate shift, in which the distributions of inputs (queries) change but the conditional distribution of outputs (answers) is unchanged, and presents machine learning theory, algorithms, and applications to overcome this variety of non-stationarity. After reviewing the state-of- the-art research in the field, the authors discuss topics that include learning under covariate shift, model selection, importance estimation, and active learning. They describe such real world applications of covariate shift adaption as brain-computer interface, speaker identification, and age prediction from facial images. With this book, they aim to encourage future research in machine learning, statistics, and engineering that strives to create truly autonomous learning machines able to learn under non- stationarity.

  • Speech Recognition by Composition of Weighted Finite Automata

    This chapter contains sections titled: Introduction, Theory, Speech Recognition, Implementation, Applications, Further Work, Appendix A: Correctness of ∈-free composition, Appendix B: General composition construction, Acknowledgments, References

  • Epilogue: Siri ... What's the Meaning of Life?

    Stanley Kubrick's 1968 film 2001: A Space Odyssey famously featured HAL, a computer with the ability to hold lengthy conversations with his fellow space travelers. More than forty years later, we have advanced computer technology that Kubrick never imagined, but we do not have computers that talk and understand speech as HAL did. Is it a failure of our technology that we have not gotten much further than an automated voice that tells us to "say or press 1"? Or is there something fundamental in human language and speech that we do not yet understand deeply enough to be able to replicate in a computer? In The Voice in the Machine, Roberto Pieraccini examines six decades of work in science and technology to develop computers that can interact with humans using speech and the industry that has arisen around the quest for these technologies. He shows that although the computers today that understand speech may not have HAL's capacity for conversation, they have capabilities that make them usable in many applications today and are on a fast track of improvement and innovation. Pieraccini describes the evolution of speech recognition and speech understanding processes from waveform methods to artificial intelligence approaches to statistical learning and modeling of human speech based on a rigorous mathematical model--specifically, Hidden Markov Models (HMM). He details the development of dialog systems, the ability to produce speech, and the process of bringing talking machines to the market. Finally, he asks a question that only the future can answer: will we end up with HAL-like computers or something completely unexpected?

  • Contributors

    Finite-state devices, which include finite-state automata, graphs, and finite- state transducers, are in wide use in many areas of computer science. Recently, there has been a resurgence of the use of finite-state devices in all aspects of computational linguistics, including dictionary encoding, text processing, and speech processing. This book describes the fundamental properties of finite-state devices and illustrates their uses. Many of the contributors pioneered the use of finite-automata for different aspects of natural language processing. The topics, which range from the theoretical to the applied, include finite-state morphology, approximation of phrase- structure grammars, deterministic part-of-speech tagging, application of a finite-state intersection grammar, a finite-state transducer for extracting information from text, and speech recognition using weighted finite automata. The introduction presents the basic theoretical results in finite-state automata and transducers. These results and algorithms are described and illustrated with simple formal language examples as well as natural language examples.Contributors : Douglas Appelt, John Bear, David Clemenceau, Maurice Gross, Jerry R. Hobbs, David Israel, Megumi Kameyama, Lauri Karttunen, Kimmo Koskenniemi, Mehryar Mohri, Eric Laporte, Fernando C. N. Pereira, Michael D. Riley, Emmanuel Roche, Yves Schabes, Max D. Silberztein, Mark Stickel, Pasi Tapanainen, Mabry Tyson, Atro Voutilainen, Rebecca N. Wright.Language, Speech, and Communication series

  • No title

    This book introduces the theory, algorithms, and implementation techniques for efficient decoding in speech recognition mainly focusing on the Weighted Finite-State Transducer (WFST) approach. The decoding process for speech recognition is viewed as a search problem whose goal is to find a sequence of words that best matches an input speech signal. Since this process becomes computationally more expensive as the system vocabulary size increases, research has long been devoted to reducing the computational cost. Recently, the WFST approach has become an important state-of-the-art speech recognition technology, because it offers improved decoding speed with fewer recognition errors compared with conventional methods. However, it is not easy to understand all the algorithms used in this framework, and they are still in a black box for many people. In this book, we review the WFST approach and aim to provide comprehensive interpretations of WFST operations and decoding algorithms to help nyone who wants to understand, develop, and study WFST- based speech recognizers. We also mention recent advances in this framework and its applications to spoken language processing. Table of Contents: Introduction / Brief Overview of Speech Recognition / Introduction to Weighted Finite-State Transducers / Speech Recognition by Weighted Finite-State Transducers / Dynamic Decoders with On-the-fly WFST Operations / Summary and Perspective

  • Contributors

    How can we engineer systems capable of "cocktail party" listening?Human listeners are able to perceptually segregate one sound source from an acoustic mixture, such as a single voice from a mixture of other voices and music at a busy cocktail party. How can we engineer "machine listening" systems that achieve this perceptual feat?Albert Bregman's book Auditory Scene Analysis, published in 1990, drew an analogy between the perception of auditory scenes and visual scenes, and described a coherent framework for understanding the perceptual organization of sound. His account has stimulated much interest in computational studies of hearing. Such studies are motivated in part by the demand for practical sound separation systems, which have many applications including noise-robust automatic speech recognition, hearing prostheses, and automatic music transcription. This emerging field has become known as computational auditory scene analysis (CASA).Computational Auditory Scene Analysis: Principles, Algorithms, and Applications provides a comprehensive and coherent account of the state of the art in CASA, in terms of the underlying principles, the algorithms and system architectures that are employed, and the potential applications of this exciting new technology. With a Foreword by Bregman, its chapters are written by leading researchers and cover a wide range of topics including: Estimation of multiple fundamental frequencies Feature-based and model-based approaches to CASA Sound separation based on spatial location Processing for reverberant environments Segregation of speech and musical signals Automatic speech recognition in noisy environments Neural and perceptual modeling of auditory organizationThe text is written at a level that will be a ccessible to graduate students and researchers from related science and engineering disciplines. The extensive bibliography accompanying each chapter will also make this book a valuable reference source. A web site accompanying the text (www.casabook.org) features software tools and sound demonstrations.

  • Notes

    Stanley Kubrick's 1968 film 2001: A Space Odyssey famously featured HAL, a computer with the ability to hold lengthy conversations with his fellow space travelers. More than forty years later, we have advanced computer technology that Kubrick never imagined, but we do not have computers that talk and understand speech as HAL did. Is it a failure of our technology that we have not gotten much further than an automated voice that tells us to "say or press 1"? Or is there something fundamental in human language and speech that we do not yet understand deeply enough to be able to replicate in a computer? In The Voice in the Machine, Roberto Pieraccini examines six decades of work in science and technology to develop computers that can interact with humans using speech and the industry that has arisen around the quest for these technologies. He shows that although the computers today that understand speech may not have HAL's capacity for conversation, they have capabilities that make them usable in many applications today and are on a fast track of improvement and innovation. Pieraccini describes the evolution of speech recognition and speech understanding processes from waveform methods to artificial intelligence approaches to statistical learning and modeling of human speech based on a rigorous mathematical model--specifically, Hidden Markov Models (HMM). He details the development of dialog systems, the ability to produce speech, and the process of bringing talking machines to the market. Finally, he asks a question that only the future can answer: will we end up with HAL-like computers or something completely unexpected?

  • Appendix: List of Symbols and Abbreviations

    As the power of computing has grown over the past few decades, the field of machine learning has advanced rapidly in both theory and practice. Machine learning methods are usually based on the assumption that the data generation mechanism does not change over time. Yet real-world applications of machine learning, including image recognition, natural language processing, speech recognition, robot control, and bioinformatics, often violate this common assumption. Dealing with non-stationarity is one of modern machine learning's greatest challenges. This book focuses on a specific non-stationary environment known as covariate shift, in which the distributions of inputs (queries) change but the conditional distribution of outputs (answers) is unchanged, and presents machine learning theory, algorithms, and applications to overcome this variety of non-stationarity. After reviewing the state-of- the-art research in the field, the authors discuss topics that include learning under covariate shift, model selection, importance estimation, and active learning. They describe such real world applications of covariate shift adaption as brain-computer interface, speaker identification, and age prediction from facial images. With this book, they aim to encourage future research in machine learning, statistics, and engineering that strives to create truly autonomous learning machines able to learn under non- stationarity.

  • Index

    Finite-state devices, which include finite-state automata, graphs, and finite- state transducers, are in wide use in many areas of computer science. Recently, there has been a resurgence of the use of finite-state devices in all aspects of computational linguistics, including dictionary encoding, text processing, and speech processing. This book describes the fundamental properties of finite-state devices and illustrates their uses. Many of the contributors pioneered the use of finite-automata for different aspects of natural language processing. The topics, which range from the theoretical to the applied, include finite-state morphology, approximation of phrase- structure grammars, deterministic part-of-speech tagging, application of a finite-state intersection grammar, a finite-state transducer for extracting information from text, and speech recognition using weighted finite automata. The introduction presents the basic theoretical results in finite-state automata and transducers. These results and algorithms are described and illustrated with simple formal language examples as well as natural language examples.Contributors : Douglas Appelt, John Bear, David Clemenceau, Maurice Gross, Jerry R. Hobbs, David Israel, Megumi Kameyama, Lauri Karttunen, Kimmo Koskenniemi, Mehryar Mohri, Eric Laporte, Fernando C. N. Pereira, Michael D. Riley, Emmanuel Roche, Yves Schabes, Max D. Silberztein, Mark Stickel, Pasi Tapanainen, Mabry Tyson, Atro Voutilainen, Rebecca N. Wright.Language, Speech, and Communication series



Standards related to Speech Recognition

Back to Top

No standards are currently tagged "Speech Recognition"