Conferences related to Speech Recognition

Back to Top

2013 7th Conference on Speech Technology and Human - Computer Dialogue (SpeD 2013)

“SpeD 2013” will bring together academics and industry professionals from universities, government agencies and companies to present their achievements in speech technology and related fields. “SpeD 2013” is a conference and international forum which will reflect some of the latest tendencies in spoken language technology and human-computer dialogue research as well as some of the most recent applications in this area.

  • 2011 6th Conference on Speech Technology and Human - Computer Dialogue (SpeD 2011)

    SpeD 2011 will bring together academics and industry professionals from universities, government agencies and companies to present their achievements and the latest tendencies in spoken language technology and human-computer dialogue research as well as some of the most recent applications in this area.

  • 2009 5th Conference on Speech Technology and Human - Computer Dialogue (SpeD 2009)

    The 5th Conference on Speech Technology and Human-Computer Dialogue (at Constanta, Romania) brings together academics and industry professionals from universities, government agencies and companies to present their achievements in speech technology and related fields. SpeD 2009 is a conference and international forum which will reflect some of the latest tendencies in spoken language technology and human-computer dialogue research as well as some of the most recent applications in this area.


2013 Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII)

The conference will address, but is not limited to, the following topics:• Computational and psychological models of emotion;• Affect in arts entertainment and multimedia;• Bodily manifestations of affect (facial expressions, posture, behavior, physiology);• Databases for emotion processing, development and issues;• Affective interfaces and applications (games, learning, dialogue systems…);• Ecological and continuous emotion assessment;• Affect in social interactions.

  • 2009 3rd International Conference on Affective Computing and Intelligent Interaction (ACII 2009)

    The conference series on Affective Computing and Intelligent Interaction is the premier international forum for state of the art in research on affective and multi modal human-machine interaction and systems. Every other year the ACII conference plays an important role in shaping related scientific, academic, and higher education programs. This year, we are especially soliciting papers discussing Enabling Behavioral and Socially-Aware Human-Machine Interfaces in areas including psychology.


2013 IEEE International Conference on Multimedia and Expo (ICME)

To promote the exchange of the latest advances in multimedia technologies, systems, and applications from both the research and development perspectives of the circuits and systems, communications, computer, and signal processing communities.

  • 2012 IEEE International Conference on Multimedia and Expo (ICME)

    IEEE International Conference on Multimedia & Expo (ICME) has been the flagship multimedia conference sponsored by four IEEE Societies. It exchanges the latest advances in multimedia technologies, systems, and applications from both the research and development perspectives of the circuits and systems, communications, computer, and signal processing communities.

  • 2011 IEEE International Conference on Multimedia and Expo (ICME)

    Speech, audio, image, video, text processing Signal processing for media integration 3D visualization, animation and virtual reality Multi-modal multimedia computing systems and human-machine interaction Multimedia communications and networking Multimedia security and privacy Multimedia databases and digital libraries Multimedia applications and services Media content analysis and search Hardware and software for multimedia systems Multimedia standards and related issues Multimedia qu

  • 2010 IEEE International Conference on Multimedia and Expo (ICME)

    A flagship multimedia conference sponsored by four IEEE societies, ICME serves as a forum to promote the exchange of the latest advances in multimedia technologies, systems, and applications from both the research and development perspectives of the circuits and systems, communications, computer, and signal processing communities.

  • 2009 IEEE International Conference on Multimedia and Expo (ICME)

    IEEE International Conference on Multimedia & Expo is a major annual international conference with the objective of bringing together researchers, developers, and practitioners from academia and industry working in all areas of multimedia. ICME serves as a forum for the dissemination of state-of-the-art research, development, and implementations of multimedia systems, technologies and applications.


2013 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)

The ASRU workshop meets every two years and has a tradition of bringing together researchers from academia and industry in an intimate and collegial setting to discuss problems of common interest in automatic speech recognition and understanding.


2013 International Carnahan Conference on Security Technology (ICCST)

This international conference is a forum for all aspects of physical, cyber and electronic security research, development, systems engineering, testing, evaluation, operations and sustainability. The ICCST facilitates the exchange of ideas and information.

  • 2012 IEEE International Carnahan Conference on Security Technology (ICCST)

    Research, development, and user aspects of security technology, including principles of operation, applications, and user experiences.

  • 2011 International Carnahan Conference on Security Technology (ICCST)

    This annual conference is the world s longest -running, international technical symposium on security technology. This conference is a forum for collaboration on all aspects of physical, cyber and electronic security research, development, systems engineering, testing, evaluation, operations and sustainment. The ICCST facilitates the exchange of ideas and sharing of information on both new and existing technology and systems. Conference participants are encouraged to consider the impact of their work on society. The ICCST provides a foundation for support to authorities and agencies responsible for security, safety and law enforcement in the use of available and future technology.

  • 2010 IEEE International Carnahan Conference on Security Technology (ICCST)

    The ICCST is a forum for researchers and practitioners in both new and existing security technology, providing an interchange of knowledge through paper presentations and publication of proceedings that have been selected by the ICCST organizing committee.

  • 2009 International Carnahan Conference on Security Technology (ICCST)

    Conference is directed toward research and development and user aspects of electronic security technology.

  • 2008 International Carnahan Conference on Security Technology (ICCST)

    The ICCST is directed toward the research and development aspects of electronic security technology, including the operational testing of the technology. It establishes a forum for the exchange of ideas and dissemination of information on both new and existing technology. Conference participants are stimulated to consider the impact of their work on society. The Conference is an interchange of knowledge through the presentation of learned papers that have been selected by the ICCST organizing committee.

  • 2007 IEEE International Carnahan Conference on Security Technology (ICCST)

  • 2006 IEEE International Carnahan Conference on Security Technology (ICCST)


More Conferences

Periodicals related to Speech Recognition

Back to Top

Audio, Speech, and Language Processing, IEEE Transactions on

Speech analysis, synthesis, coding speech recognition, speaker recognition, language modeling, speech production and perception, speech enhancement. In audio, transducers, room acoustics, active sound control, human audition, analysis/synthesis/coding of music, and consumer audio. (8) (IEEE Guide for Authors) The scope for the proposed transactions includes SPEECH PROCESSING - Transmission and storage of Speech signals; speech coding; speech enhancement and noise reduction; ...


Pattern Analysis and Machine Intelligence, IEEE Transactions on

Statistical and structural pattern recognition; image analysis; computational models of vision; computer vision systems; enhancement, restoration, segmentation, feature extraction, shape and texture analysis; applications of pattern analysis in medicine, industry, government, and the arts and sciences; artificial intelligence, knowledge representation, logical and probabilistic inference, learning, speech recognition, character and text recognition, syntactic and semantic processing, understanding natural language, expert systems, ...


Selected Areas in Communications, IEEE Journal on

All telecommunications, including telephone, telegraphy, facsimile, and point-to-point television, by electromagnetic propagation, including radio; wire; aerial, underground, coaxial, and submarine cables; waveguides, communication satellites, and lasers; in marine, aeronautical, space, and fixed station services; repeaters, radio relaying, signal storage, and regeneration; telecommunication error detection and correction; multiplexing and carrier techniques; communication switching systems; data communications; communication theory; and wireless communications.


Systems, Man and Cybernetics, Part A, IEEE Transactions on

Systems engineering, including efforts that involve issue formnaulations, issue analysis and modeling, and decision making and issue interpretation at any of the life-cycle phases associated with the definition, development, and implementation of large systems. It will also include efforts that relate to systems management, systems engineering processes and a variety of systems engineering methods such as optimization, modeling and simulation. ...



Most published Xplore authors for Speech Recognition

Back to Top

Xplore Articles related to Speech Recognition

Back to Top

Robust spoken language identification using large vocabulary speech recognition

J. L. Hieronymus; S. Kadambe Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on, 1997

A robust, task independent spoken language identification (LID) system which uses a large vocabulary continuous speech recognition (LVCSR) module for each language to choose the most likely language spoken is described. The acoustic analysis uses mean cepstral removal on mel scale cepstral coefficients to compensate for different input channels. The system has been trained on 5 languages: English, German, Japanese, ...


Speech Dereverberation Based on Maximum-Likelihood Estimation With Time-Varying Gaussian Source Model

Tomohiro Nakatani; Biing-Hwang Juang; Takuya Yoshioka; Keisuke Kinoshita; Marc Delcroix; Masato Miyoshi IEEE Transactions on Audio, Speech, and Language Processing, 2008

Distant acquisition of acoustic signals in an enclosed space often produces reverberant components due to acoustic reflections in the room. Speech dereverberation is in general desirable when the signal is acquired through distant microphones in such applications as hands-free speech recognition, teleconferencing, and meeting recording. This paper proposes a new speech dereverberation approach based on a statistical speech model. A ...


Providing single and multi-channel acoustical robustness to speaker identification systems

J. Ortega-Garcia; J. Gonzalez-Rodriguez Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on, 1997

Acoustical mismatch between training and testing phases induces degradation of performance in automatic speaker recognition systems. Providing robustness to speaker recognizers has to be, therefore, a priority matter. Robustness in the acoustical stage can be accomplished through speech enhancement techniques as a prior stage to the recognizer. These techniques are oriented to the reduction of the impact that acoustical noise ...


fIntegrating confidence scores for utterance verification

Binfeng Yan; Xiaoyan Zhu Intelligent Control and Automation, 2004. WCICA 2004. Fifth World Congress on, 2004

In this paper we present an approach to integrating confidence scores for utterance verification based on neural network. In addition to the confidence scores computed at the phonetic level, we use various novel confidence scores, including segmental confidence scores, likelihood ratio and recognition results of LPC feature. We describe a method to combine different confidence scores via a neural work ...


Application of the neural networks for text-to-phoneme mapping

Eniko Beatrice Bilcu; Petri Salmela; Janne Suontausta; Jukka Saarinen Signal Processing Conference, 2002 11th European, 2002

In this paper we present the results on the use of neural networks for text- to-phoneme mapping. For this mapping, we have compared the performances of the Context Dependent Multilayer Perceptron network with the Recurrent Neural Network. The results (number of parameters vs neural network model vs phoneme accuracy) are given for American English. Also some guidelines for selecting the ...


More Xplore Articles

Educational Resources on Speech Recognition

Back to Top

eLearning

Robust spoken language identification using large vocabulary speech recognition

J. L. Hieronymus; S. Kadambe Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on, 1997

A robust, task independent spoken language identification (LID) system which uses a large vocabulary continuous speech recognition (LVCSR) module for each language to choose the most likely language spoken is described. The acoustic analysis uses mean cepstral removal on mel scale cepstral coefficients to compensate for different input channels. The system has been trained on 5 languages: English, German, Japanese, ...


Speech Dereverberation Based on Maximum-Likelihood Estimation With Time-Varying Gaussian Source Model

Tomohiro Nakatani; Biing-Hwang Juang; Takuya Yoshioka; Keisuke Kinoshita; Marc Delcroix; Masato Miyoshi IEEE Transactions on Audio, Speech, and Language Processing, 2008

Distant acquisition of acoustic signals in an enclosed space often produces reverberant components due to acoustic reflections in the room. Speech dereverberation is in general desirable when the signal is acquired through distant microphones in such applications as hands-free speech recognition, teleconferencing, and meeting recording. This paper proposes a new speech dereverberation approach based on a statistical speech model. A ...


Providing single and multi-channel acoustical robustness to speaker identification systems

J. Ortega-Garcia; J. Gonzalez-Rodriguez Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on, 1997

Acoustical mismatch between training and testing phases induces degradation of performance in automatic speaker recognition systems. Providing robustness to speaker recognizers has to be, therefore, a priority matter. Robustness in the acoustical stage can be accomplished through speech enhancement techniques as a prior stage to the recognizer. These techniques are oriented to the reduction of the impact that acoustical noise ...


fIntegrating confidence scores for utterance verification

Binfeng Yan; Xiaoyan Zhu Intelligent Control and Automation, 2004. WCICA 2004. Fifth World Congress on, 2004

In this paper we present an approach to integrating confidence scores for utterance verification based on neural network. In addition to the confidence scores computed at the phonetic level, we use various novel confidence scores, including segmental confidence scores, likelihood ratio and recognition results of LPC feature. We describe a method to combine different confidence scores via a neural work ...


Application of the neural networks for text-to-phoneme mapping

Eniko Beatrice Bilcu; Petri Salmela; Janne Suontausta; Jukka Saarinen Signal Processing Conference, 2002 11th European, 2002

In this paper we present the results on the use of neural networks for text- to-phoneme mapping. For this mapping, we have compared the performances of the Context Dependent Multilayer Perceptron network with the Recurrent Neural Network. The results (number of parameters vs neural network model vs phoneme accuracy) are given for American English. Also some guidelines for selecting the ...


More eLearning Resources

IEEE-USA E-Books

  • Conclusion

    Neural Networks for Pattern Recognition takes the pioneering work in artificial neural networks by Stephen Grossberg and his colleagues to a new level. In a simple and accessible way it extends embedding field theory into areas of machine intelligence that have not been clearly dealt with before. Following a tutorial of existing neural networks for pattern classification, Nigrin expands on these networks to present fundamentally new architectures that perform realtime pattern classification of embedded and synonymous patterns and that will aid in tasks such as vision, speech recognition, sensor fusion, and constraint satisfaction.Nigrin presents the new architectures in two stages. First he presents a network called Sonnet 1 that already achieves important properties such as the ability to learn and segment continuously varied input patterns in real time, to process patterns in a context sensitive fashion, and to learn new patterns without degrading existing categories. He then removes simplifications inherent in Sonnet 1 and introduces radically new architectures. These architectures have the power to classify patterns that may have similar meanings but that have different external appearances (synonyms). They also have been designed to represent patterns in a distributed fashion, both in short-term and long-term memory.Albert Nigrin is Assistant Professor in the Department of Computer Science and Information Systems at American University.

  • Automatic Speech Recognition and Understanding

    This chapter contains sections titled: Understanding the Task and the Signal, Understanding the Signal, Current Status, Selected References

  • What a Beautiful Voice

    This chapter contains sections titled: That's an Ugly Voice, It Must Be a Computer Talking A Multimedia Communication Hey, What are You Talking About? (Computerized) Talk Is Not Cheap Speech Synthesis Speech Recognition What Did You Say? Neural and Fuzzy Approaches It's All a Matter of Semantics

  • Automatic Speech Recognition

    This chapter contains sections titled: Introduction Basic Pattern Recognition Approach Preprocessing Parametric Representation Parametric Representation Accommodating Both Spectral and Temporal Variability Networks for Speech Recognition Adapting to Variability in Speech Language Models (LMs) Search Design Artificial Neural Networks Expert-System Approach to ASR Commercial Systems Summary of Current ASR Design Conclusion This chapter contains sections titled: Problems

  • Speech Recognition in Multimedia HumanMachine Interfaces Using Neural Networks

    The past decade has been highlighted by the emerging technology of multimedia interface design. Intelligent multimedia interfaces can be developed that require very little computer sophistication on the part of the user. This chapter focuses on speech recognition systems applied to multimedia human- machine interfaces. There are a number of speech recognition systems on the market today, and some of them can be integrated into task-specific applications. However, speech recognition research still faces a few challenges in the area of multimedia human-machine interfaces. This chapter presents some approaches based on neural networks for Mandarin speech recognition. In practical applications, a robust Mandarin speech recognition system (VenusDictate) applied to multimedia interfaces is described.

  • Notes

    Stanley Kubrick's 1968 film 2001: A Space Odyssey famously featured HAL, a computer with the ability to hold lengthy conversations with his fellow space travelers. More than forty years later, we have advanced computer technology that Kubrick never imagined, but we do not have computers that talk and understand speech as HAL did. Is it a failure of our technology that we have not gotten much further than an automated voice that tells us to "say or press 1"? Or is there something fundamental in human language and speech that we do not yet understand deeply enough to be able to replicate in a computer? In The Voice in the Machine, Roberto Pieraccini examines six decades of work in science and technology to develop computers that can interact with humans using speech and the industry that has arisen around the quest for these technologies. He shows that although the computers today that understand speech may not have HAL's capacity for conversation, they have capabilities that make them usable in many applications today and are on a fast track of improvement and innovation. Pieraccini describes the evolution of speech recognition and speech understanding processes from waveform methods to artificial intelligence approaches to statistical learning and modeling of human speech based on a rigorous mathematical model--specifically, Hidden Markov Models (HMM). He details the development of dialog systems, the ability to produce speech, and the process of bringing talking machines to the market. Finally, he asks a question that only the future can answer: will we end up with HAL-like computers or something completely unexpected?

  • Glossary

    Neural Networks for Pattern Recognition takes the pioneering work in artificial neural networks by Stephen Grossberg and his colleagues to a new level. In a simple and accessible way it extends embedding field theory into areas of machine intelligence that have not been clearly dealt with before. Following a tutorial of existing neural networks for pattern classification, Nigrin expands on these networks to present fundamentally new architectures that perform realtime pattern classification of embedded and synonymous patterns and that will aid in tasks such as vision, speech recognition, sensor fusion, and constraint satisfaction.Nigrin presents the new architectures in two stages. First he presents a network called Sonnet 1 that already achieves important properties such as the ability to learn and segment continuously varied input patterns in real time, to process patterns in a context sensitive fashion, and to learn new patterns without degrading existing categories. He then removes simplifications inherent in Sonnet 1 and introduces radically new architectures. These architectures have the power to classify patterns that may have similar meanings but that have different external appearances (synonyms). They also have been designed to represent patterns in a distributed fashion, both in short-term and long-term memory.Albert Nigrin is Assistant Professor in the Department of Computer Science and Information Systems at American University.

  • Logic Programming for Processing Natural Language

    We uncover a natural alliance between natural language and logic programming, which was apparent in the beginnings of the latter and is becoming again apparent in a more mature state-of-the-art way. We first present a short historic overview, from the origins of Prolog as a "man-machine" system for communicating in natural language, to the present promise, implicit in recent developments, that natural language for controlling various AI applications may not be too far away after all. We then briefly describe three important families of linguistically principled approaches to natural language processing: unification-based (also known as constraint-based), logico- mathematical, and principles-and-parameters. We stress common points between the areas of natural language processing through logic programming and linguistic theory, such as the ideas of infer-encing, unification and constraints. We also note cross-fertilizations among these fields, such as memoization, which was born from David D. H. Warren's observation that the Earley algorithm ideas about parsing should be transfered into logic programming proper; as well as with other fields (e.g. connections with parametric L-systems, which were developed for visual models of plant development and constitute an interesting parallel grammar paradigm; Datalog grammars, inspired by database theory, and with interesting termination properties). Among the latest developments, we describe in some more detail Assumption Grammars, which we believe to be the best compromise to date between expressive and linguistic power. These are basically Definite Clause Grammars, augmented with linear and intuitionistic implication, and which handle multiple streams (as in Peter Van Roig's Extended Definite Clause Grammars, but without the need of a preprocessing technique). We then show how Assumption Grammars are useful for typical hard problems in natural language processing: anaphora, coordination, co-specification, free word order. We also show two recent results which were surprising to us, namely: a) Assumption grammars allow a direct and efficient implementation of link grammars -a context-free like formalism developed independently from logic grammars; and b) they offer the flexibility of switching between data-driven or goal-driven reasoning, at no overhead in terms of either syntax or implementation. Next, we discuss several interesting applications, most of them briefly but the last one in some detail: concept-based retrieval through natural language, driving robots through Natural Language, error diagnosis and repair, machine translation, language front ends to knowledge based systems, and controlling virtual worlds through natural language. We argue that we can exploit the characteristics of virtual worlds and of Assumption Grammars to develop untraditional but extremely expressive natural language analysers (with complete non-determinism, partial formula evaluation through consulting world knowledge gleaned from the net, easy extensibility both in terms of language coverage and of transfer into other natural languages), and that these new techniques might soon lead to the direct use of some form of controlled language input, together with speech recognition, as a language front end to many AI applications, with Prolog as an invisible mediator.

  • Index

    Many important problems involve decision making under uncertainty -- that is, choosing actions based on often imperfect observations, with unknown outcomes. Designers of automated decision support systems must take into account the various sources of uncertainty while balancing the multiple objectives of the system. This book provides an introduction to the challenges of decision making under uncertainty from a computational perspective. It presents both the theory behind decision making models and algorithms and a collection of example applications that range from speech recognition to aircraft collision avoidance. Focusing on two methods for designing decision agents, planning and reinforcement learning, the book covers probabilistic models, introducing Bayesian networks as a graphical model that captures probabilistic relationships between variables; utility theory as a framework for understanding optimal decision making under uncertainty; Markov decision processes as a me hod for modeling sequential problems; model uncertainty; state uncertainty; and cooperative decision making involving multiple interacting agents. A series of applications shows how the theoretical concepts can be applied to systems for attribute-based person search, speech applications, collision avoidance, and unmanned aircraft persistent surveillance. _Decision Making Under Uncertainty _unifies research from different communities using consistent notation, and is accessible to students and researchers across engineering disciplines who have some prior exposure to probability theory and calculus. It can be used as a text for advanced undergraduate and graduate students in fields including computer science, aerospace and electrical engineering, and management science. It will also be a valuable professional reference for researchers in a variety of disciplines.

  • Back Matter

    This collection of essays by 12 members of the MIT staff, provides an inside report on the scope and expectations of current research in one of the world's major AI centers. The chapters on artificial intelligence, expert systems, vision, robotics, and natural language provide both a broad overview of current areas of activity and an assessment of the field at a time of great public interest and rapid technological progress.Contents: Artificial Intelligence (Patrick H. Winston and Karen Prendergast). KnowledgeBased Systems (Randall Davis). Expert-System Tools and Techniques (Peter Szolovits). Medical Diagnosis: Evolution of Systems Building Expertise (Ramesh S. Patil). Artificial Intelligence and Software Engineering (Charles Rich and Richard C. Waters). Intelligent Natural Language Processing (Robert C. Berwick). Automatic Speech Recognition and Understanding (Victor W. Zue). Robot Programming and Artificial Intelligence (Tomas Lozano-Perez). Robot Hands and Tactile Sensing (John M. Hollerbach). Intelligent Vision (Michael Brady). Making Robots See (W. Eric L. Grimson). Autonomous Mobile Robots (Rodney A. Brooks).W. Eric L. Grimson, author of From Images to Surfaces: A Computational Study of the Human Early Vision System (MIT Press 1981), and Ramesh S. Patil are both Assistant Professors in the Department of Electrical Engineering and Computer Science at MIT. AI in the 1980s and Beyond is included in the Artificial Intelligence Series, edited by Patrick H. Winston and Michael Brady.



Standards related to Speech Recognition

Back to Top

No standards are currently tagged "Speech Recognition"


Jobs related to Speech Recognition

Back to Top