Conferences related to Speech Recognition

Back to Top

2013 7th Conference on Speech Technology and Human - Computer Dialogue (SpeD 2013)

  • 2011 6th Conference on Speech Technology and Human - Computer Dialogue (SpeD 2011)

    SpeD 2011 will bring together academics and industry professionals from universities, government agencies and companies to present their achievements and the latest tendencies in spoken language technology and human-computer dialogue research as well as some of the most recent applications in this area.

  • 2009 5th Conference on Speech Technology and Human - Computer Dialogue (SpeD 2009)

    The 5th Conference on Speech Technology and Human-Computer Dialogue (at Constanta, Romania) brings together academics and industry professionals from universities, government agencies and companies to present their achievements in speech technology and related fields. SpeD 2009 is a conference and international forum which will reflect some of the latest tendencies in spoken language technology and human-computer dialogue research as well as some of the most recent applications in this area.


2013 Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII)

The conference will address, but is not limited to, the following topics:

  • 2009 3rd International Conference on Affective Computing and Intelligent Interaction (ACII 2009)

    The conference series on Affective Computing and Intelligent Interaction is the premier international forum for state of the art in research on affective and multi modal human-machine interaction and systems. Every other year the ACII conference plays an important role in shaping related scientific, academic, and higher education programs. This year, we are especially soliciting papers discussing Enabling Behavioral and Socially-Aware Human-Machine Interfaces in areas including psychology.


2013 IEEE International Conference on Multimedia and Expo (ICME)

To promote the exchange of the latest advances in multimedia technologies, systems, and applications from both the research and development perspectives of the circuits and systems, communications, computer, and signal processing communities.

  • 2012 IEEE International Conference on Multimedia and Expo (ICME)

    IEEE International Conference on Multimedia & Expo (ICME) has been the flagship multimedia conference sponsored by four IEEE Societies. It exchanges the latest advances in multimedia technologies, systems, and applications from both the research and development perspectives of the circuits and systems, communications, computer, and signal processing communities.

  • 2011 IEEE International Conference on Multimedia and Expo (ICME)

    Speech, audio, image, video, text processing Signal processing for media integration 3D visualization, animation and virtual reality Multi-modal multimedia computing systems and human-machine interaction Multimedia communications and networking Multimedia security and privacy Multimedia databases and digital libraries Multimedia applications and services Media content analysis and search Hardware and software for multimedia systems Multimedia standards and related issues Multimedia qu

  • 2010 IEEE International Conference on Multimedia and Expo (ICME)

    A flagship multimedia conference sponsored by four IEEE societies, ICME serves as a forum to promote the exchange of the latest advances in multimedia technologies, systems, and applications from both the research and development perspectives of the circuits and systems, communications, computer, and signal processing communities.

  • 2009 IEEE International Conference on Multimedia and Expo (ICME)

    IEEE International Conference on Multimedia & Expo is a major annual international conference with the objective of bringing together researchers, developers, and practitioners from academia and industry working in all areas of multimedia. ICME serves as a forum for the dissemination of state-of-the-art research, development, and implementations of multimedia systems, technologies and applications.


2013 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)

The ASRU workshop meets every two years and has a tradition of bringing together researchers from academia and industry in an intimate and collegial setting to discuss problems of common interest in automatic speech recognition and understanding.


2013 International Carnahan Conference on Security Technology (ICCST)

This international conference is a forum for all aspects of physical, cyber and electronic security research, development, systems engineering, testing, evaluation, operations and sustainability. The ICCST facilitates the exchange of ideas and information.

  • 2012 IEEE International Carnahan Conference on Security Technology (ICCST)

    Research, development, and user aspects of security technology, including principles of operation, applications, and user experiences.

  • 2011 International Carnahan Conference on Security Technology (ICCST)

    This annual conference is the world s longest -running, international technical symposium on security technology. This conference is a forum for collaboration on all aspects of physical, cyber and electronic security research, development, systems engineering, testing, evaluation, operations and sustainment. The ICCST facilitates the exchange of ideas and sharing of information on both new and existing technology and systems. Conference participants are encouraged to consider the impact of their work on society. The ICCST provides a foundation for support to authorities and agencies responsible for security, safety and law enforcement in the use of available and future technology.

  • 2010 IEEE International Carnahan Conference on Security Technology (ICCST)

    The ICCST is a forum for researchers and practitioners in both new and existing security technology, providing an interchange of knowledge through paper presentations and publication of proceedings that have been selected by the ICCST organizing committee.

  • 2009 International Carnahan Conference on Security Technology (ICCST)

    Conference is directed toward research and development and user aspects of electronic security technology.

  • 2008 International Carnahan Conference on Security Technology (ICCST)

    The ICCST is directed toward the research and development aspects of electronic security technology, including the operational testing of the technology. It establishes a forum for the exchange of ideas and dissemination of information on both new and existing technology. Conference participants are stimulated to consider the impact of their work on society. The Conference is an interchange of knowledge through the presentation of learned papers that have been selected by the ICCST organizing committee.

  • 2007 IEEE International Carnahan Conference on Security Technology (ICCST)

  • 2006 IEEE International Carnahan Conference on Security Technology (ICCST)


More Conferences

Periodicals related to Speech Recognition

Back to Top

Audio, Speech, and Language Processing, IEEE Transactions on

Speech analysis, synthesis, coding speech recognition, speaker recognition, language modeling, speech production and perception, speech enhancement. In audio, transducers, room acoustics, active sound control, human audition, analysis/synthesis/coding of music, and consumer audio. (8) (IEEE Guide for Authors) The scope for the proposed transactions includes SPEECH PROCESSING - Transmission and storage of Speech signals; speech coding; speech enhancement and noise reduction; ...


Biomedical Engineering, IEEE Transactions on

Broad coverage of concepts and methods of the physical and engineering sciences applied in biology and medicine, ranging from formalized mathematical theory through experimental science and technological development to practical clinical applications.


Pattern Analysis and Machine Intelligence, IEEE Transactions on

Statistical and structural pattern recognition; image analysis; computational models of vision; computer vision systems; enhancement, restoration, segmentation, feature extraction, shape and texture analysis; applications of pattern analysis in medicine, industry, government, and the arts and sciences; artificial intelligence, knowledge representation, logical and probabilistic inference, learning, speech recognition, character and text recognition, syntactic and semantic processing, understanding natural language, expert systems, ...


Selected Areas in Communications, IEEE Journal on

All telecommunications, including telephone, telegraphy, facsimile, and point-to-point television, by electromagnetic propagation, including radio; wire; aerial, underground, coaxial, and submarine cables; waveguides, communication satellites, and lasers; in marine, aeronautical, space, and fixed station services; repeaters, radio relaying, signal storage, and regeneration; telecommunication error detection and correction; multiplexing and carrier techniques; communication switching systems; data communications; communication theory; and wireless communications.


Systems, Man and Cybernetics, Part A, IEEE Transactions on

Systems engineering, including efforts that involve issue formnaulations, issue analysis and modeling, and decision making and issue interpretation at any of the life-cycle phases associated with the definition, development, and implementation of large systems. It will also include efforts that relate to systems management, systems engineering processes and a variety of systems engineering methods such as optimization, modeling and simulation. ...



Most published Xplore authors for Speech Recognition

Back to Top

Xplore Articles related to Speech Recognition

Back to Top

Dynamic control of a production model [speech]

L. Candille; H. Meloni Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on, 1996

A number of experiments have shown that it is possible to use production models for speech recognition tasks. We present the results of an adaptation of S. Maeda's (1979) statistical model. We have also demonstrated the importance of taking into account the static and dynamic characteristics of the speaker. Some preliminary results for the identification of V1-V2 (i.e. vowel diphone) ...


Speaker adaptation of neural network acoustic models using i-vectors

George Saon; Hagen Soltau; David Nahamoo; Michael Picheny 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

We propose to adapt deep neural network (DNN) acoustic models to a target speaker by supplying speaker identity vectors (i-vectors) as input features to the network in parallel with the regular acoustic features for ASR. For both training and test, the i-vector for a given speaker is concatenated to every frame belonging to that speaker and changes across different speakers. ...


Likelihood normalization using an ergodic HMM for continuous speech recognition

K. Ozeki Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on, 1996

In recent speech recognition technology, the score of a hypothesis is often defined on the basis of HMM likelihood. As is well known, however, direct use of the likelihood as a scoring function causes difficult problems, especially when the length of a speech segment varies depending on the hypothesis, as in word-spotting, and some kind of normalization is indispensable. In ...


Audio Segmentation via Tri-Model Bayesian Information Criterion

Yunfeng Du; Wei Hu; Yonghong Yan; Tao Wang; Yimin Zhang 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07, 2007

This paper addresses the problem of audio segmentation in practical media (e.g. TV series, movies and etc.) which usually consists of segments in various lengths with quite a portion of short ones. An unsupervised audio segmentation approach is presented, including a segmentation-stage to detect potential acoustic changes, and a refinement-stage to refine these candidate changes by a tri-model Bayesian information ...


Automatic identification of positive or negative language

Erez Posner; Omer David; Vered Aharonson; Gabi Shafat 2012 IEEE 27th Convention of Electrical and Electronics Engineers in Israel, 2012

Personal coaching, performed by professionals such as psychologists, usually includes training for business as well as social situations such as job interviews, business meetings, interaction with a customer service provider, and more. This requires careful preparation in which, among other traits, the trainees need to pay attention to the words they choose in the interaction, in order to make a ...


More Xplore Articles

Educational Resources on Speech Recognition

Back to Top

eLearning

Dynamic control of a production model [speech]

L. Candille; H. Meloni Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on, 1996

A number of experiments have shown that it is possible to use production models for speech recognition tasks. We present the results of an adaptation of S. Maeda's (1979) statistical model. We have also demonstrated the importance of taking into account the static and dynamic characteristics of the speaker. Some preliminary results for the identification of V1-V2 (i.e. vowel diphone) ...


Speaker adaptation of neural network acoustic models using i-vectors

George Saon; Hagen Soltau; David Nahamoo; Michael Picheny 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

We propose to adapt deep neural network (DNN) acoustic models to a target speaker by supplying speaker identity vectors (i-vectors) as input features to the network in parallel with the regular acoustic features for ASR. For both training and test, the i-vector for a given speaker is concatenated to every frame belonging to that speaker and changes across different speakers. ...


Likelihood normalization using an ergodic HMM for continuous speech recognition

K. Ozeki Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on, 1996

In recent speech recognition technology, the score of a hypothesis is often defined on the basis of HMM likelihood. As is well known, however, direct use of the likelihood as a scoring function causes difficult problems, especially when the length of a speech segment varies depending on the hypothesis, as in word-spotting, and some kind of normalization is indispensable. In ...


Audio Segmentation via Tri-Model Bayesian Information Criterion

Yunfeng Du; Wei Hu; Yonghong Yan; Tao Wang; Yimin Zhang 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07, 2007

This paper addresses the problem of audio segmentation in practical media (e.g. TV series, movies and etc.) which usually consists of segments in various lengths with quite a portion of short ones. An unsupervised audio segmentation approach is presented, including a segmentation-stage to detect potential acoustic changes, and a refinement-stage to refine these candidate changes by a tri-model Bayesian information ...


Automatic identification of positive or negative language

Erez Posner; Omer David; Vered Aharonson; Gabi Shafat 2012 IEEE 27th Convention of Electrical and Electronics Engineers in Israel, 2012

Personal coaching, performed by professionals such as psychologists, usually includes training for business as well as social situations such as job interviews, business meetings, interaction with a customer service provider, and more. This requires careful preparation in which, among other traits, the trainees need to pay attention to the words they choose in the interaction, in order to make a ...


More eLearning Resources

IEEE-USA E-Books

  • No title

    Immediately following the Second World War, between 1947 and 1955, several classic papers quantified the fundamentals of human speech information processing and recognition. In 1947 French and Steinberg published their classic study on the articulation index. In 1948 Claude Shannon published his famous work on the theory of information. In 1950 Fletcher and Galt published their theory of the articulation index, a theory that Fletcher had worked on for 30 years, which integrated his classic works on loudness and speech perception with models of speech intelligibility. In 1951 George Miller then wrote the first book Language and Communication, analyzing human speech communication with Claude Shannon's just published theory of information. Finally in 1955 George Miller published the first extensive analysis of phone decoding, in the form of confusion matrices, as a function of the speech-to- noise ratio. This work extended the Bell Labs' speech articulation studies with ideas from Shann n's Information theory. Both Miller and Fletcher showed that speech, as a code, is incredibly robust to mangling distortions of filtering and noise. Regrettably much of this early work was forgotten. While the key science of information theory blossomed, other than the work of George Miller, it was rarely applied to aural speech research. The robustness of speech, which is the most amazing thing about the speech code, has rarely been studied. It is my belief (i.e., assumption) that we can analyze speech intelligibility with the scientific method. The quantitative analysis of speech intelligibility requires both science and art. The scientific component requires an error analysis of spoken communication, which depends critically on the use of statistics, information theory, and psychophysical methods. The artistic component depends on knowing how to restrict the problem in such a way that progress may be made. It is critical to tease out the relevant from the irrelevant and dig for th key issues. This will focus us on the decoding of nonsense phonemes with no visual component, which have been mangled by filtering and noise. This monograph is a summary and theory of human speech recognition. It builds on and integrates the work of Fletcher, Miller, and Shannon. The long-term goal is to develop a quantitative theory for predicting the recognition of speech sounds. In Chapter 2 the theory is developed for maximum entropy (MaxEnt) speech sounds, also called nonsense speech. In Chapter 3, context is factored in. The book is largely reflective, and quantitative, with a secondary goal of providing an historical context, along with the many deep insights found in these early works.

  • Glossary

    Neural Networks for Pattern Recognition takes the pioneering work in artificial neural networks by Stephen Grossberg and his colleagues to a new level. In a simple and accessible way it extends embedding field theory into areas of machine intelligence that have not been clearly dealt with before. Following a tutorial of existing neural networks for pattern classification, Nigrin expands on these networks to present fundamentally new architectures that perform realtime pattern classification of embedded and synonymous patterns and that will aid in tasks such as vision, speech recognition, sensor fusion, and constraint satisfaction.Nigrin presents the new architectures in two stages. First he presents a network called Sonnet 1 that already achieves important properties such as the ability to learn and segment continuously varied input patterns in real time, to process patterns in a context sensitive fashion, and to learn new patterns without degrading existing categories. He then removes simplifications inherent in Sonnet 1 and introduces radically new architectures. These architectures have the power to classify patterns that may have similar meanings but that have different external appearances (synonyms). They also have been designed to represent patterns in a distributed fashion, both in short-term and long-term memory.Albert Nigrin is Assistant Professor in the Department of Computer Science and Information Systems at American University.

  • No title

    Speech dynamics refer to the temporal characteristics in all stages of the human speech communication process. This speech "chain" starts with the formation of a linguistic message in a speaker's brain and ends with the arrival of the message in a listener's brain. Given the intricacy of the dynamic speech process and its fundamental importance in human communication, this monograph is intended to provide a comprehensive material on mathematical models of speech dynamics and to address the following issues: How do we make sense of the complex speech process in terms of its functional role of speech communication? How do we quantify the special role of speech timing? How do the dynamics relate to the variability of speech that has often been said to seriously hamper automatic speech recognition? How do we put the dynamic process of speech into a quantitative form to enable detailed analyses? And finally, how can we incorporate the knowledge of speech dynamics into com uterized speech analysis and recognition algorithms? The answers to all these questions require building and applying computational models for the dynamic speech process. What are the compelling reasons for carrying out dynamic speech modeling? We provide the answer in two related aspects. First, scientific inquiry into the human speech code has been relentlessly pursued for several decades. As an essential carrier of human intelligence and knowledge, speech is the most natural form of human communication. Embedded in the speech code are linguistic (as well as para-linguistic) messages, which are conveyed through four levels of the speech chain. Underlying the robust encoding and transmission of the linguistic messages are the speech dynamics at all the four levels. Mathematical modeling of speech dynamics provides an effective tool in the scientific methods of studying the speech chain. Such scientific studies help understand why humans speak as they do and how humans exploit redunda cy and variability by way of multitiered dynamic processes to enhance the efficiency and effectiveness of human speech communication. Second, advancement of human language technology, especially that in automatic recognition of natural-style human speech is also expected to benefit from comprehensive computational modeling of speech dynamics. The limitations of current speech recognition technology are serious and are well known. A commonly acknowledged and frequently discussed weakness of the statistical model underlying current speech recognition technology is the lack of adequate dynamic modeling schemes to provide correlation structure across the temporal speech observation sequence. Unfortunately, due to a variety of reasons, the majority of current research activities in this area favor only incremental modifications and improvements to the existing HMM-based state-of-the-art. For example, while the dynamic and correlation modeling is known to be an important topic, most of the ystems nevertheless employ only an ultra-weak form of speech dynamics; e.g., differential or delta parameters. Strong-form dynamic speech modeling, which is the focus of this monograph, may serve as an ultimate solution to this problem. After the introduction chapter, the main body of this monograph consists of four chapters. They cover various aspects of theory, algorithms, and applications of dynamic speech models, and provide a comprehensive survey of the research work in this area spanning over past 20~years. This monograph is intended as advanced materials of speech and signal processing for graudate-level teaching, for professionals and engineering practioners, as well as for seasoned researchers and engineers specialized in speech processing

  • Index

    This collection of essays by 12 members of the MIT staff, provides an inside report on the scope and expectations of current research in one of the world's major AI centers. The chapters on artificial intelligence, expert systems, vision, robotics, and natural language provide both a broad overview of current areas of activity and an assessment of the field at a time of great public interest and rapid technological progress.Contents: Artificial Intelligence (Patrick H. Winston and Karen Prendergast). KnowledgeBased Systems (Randall Davis). Expert-System Tools and Techniques (Peter Szolovits). Medical Diagnosis: Evolution of Systems Building Expertise (Ramesh S. Patil). Artificial Intelligence and Software Engineering (Charles Rich and Richard C. Waters). Intelligent Natural Language Processing (Robert C. Berwick). Automatic Speech Recognition and Understanding (Victor W. Zue). Robot Programming and Artificial Intelligence (Tomas Lozano-Perez). Robot Hands and Tactile Sensing (John M. Hollerbach). Intelligent Vision (Michael Brady). Making Robots See (W. Eric L. Grimson). Autonomous Mobile Robots (Rodney A. Brooks).W. Eric L. Grimson, author of From Images to Surfaces: A Computational Study of the Human Early Vision System (MIT Press 1981), and Ramesh S. Patil are both Assistant Professors in the Department of Electrical Engineering and Computer Science at MIT. AI in the 1980s and Beyond is included in the Artificial Intelligence Series, edited by Patrick H. Winston and Michael Brady.

  • Bibliography

    Neural Networks for Pattern Recognition takes the pioneering work in artificial neural networks by Stephen Grossberg and his colleagues to a new level. In a simple and accessible way it extends embedding field theory into areas of machine intelligence that have not been clearly dealt with before. Following a tutorial of existing neural networks for pattern classification, Nigrin expands on these networks to present fundamentally new architectures that perform realtime pattern classification of embedded and synonymous patterns and that will aid in tasks such as vision, speech recognition, sensor fusion, and constraint satisfaction.Nigrin presents the new architectures in two stages. First he presents a network called Sonnet 1 that already achieves important properties such as the ability to learn and segment continuously varied input patterns in real time, to process patterns in a context sensitive fashion, and to learn new patterns without degrading existing categories. He then removes simplifications inherent in Sonnet 1 and introduces radically new architectures. These architectures have the power to classify patterns that may have similar meanings but that have different external appearances (synonyms). They also have been designed to represent patterns in a distributed fashion, both in short-term and long-term memory.Albert Nigrin is Assistant Professor in the Department of Computer Science and Information Systems at American University.

  • The Teradata SQL3 Multimedia Database Server

    Multimedia applications - such as fingerprint matching, signature verification, face recognition, and speech recognition or translation - require complex abstract data-type support within database management systems. However, conventional databases are not designed to support multimedia. In this chapter, we describe several multimedia database challenges and explain how Teradata solves these problems with its SQL3 Multimedia Database system. A key component of this system is the Multimedia Object Manager, a general- purpose content-based multimedia server designed for the symmetric multiprocessor (SMP) and massively parallel processor (MPP) environments. The Teradata SQL3 Multimedia Database system allows users to define and manipulate user-defined functions (UDFs), which are invoked in parallel in the Multimedia Object Manager to analyze/manipulate the contents of multimedia objects. The two key characteristics of this subsystem are its support for content-based retrieval and multimodal integration. We provide an in-depth analysis of retrieval techniques using feature extraction and spatial indices. We also illustrate the power of multimodal integration by walking through the development of a complex application involving the generation of a ?>talking agent,?> which uses speech, image, and video data types within the database system.

  • No title

    This book introduces the theory, algorithms, and implementation techniques for efficient decoding in speech recognition mainly focusing on the Weighted Finite-State Transducer (WFST) approach. The decoding process for speech recognition is viewed as a search problem whose goal is to find a sequence of words that best matches an input speech signal. Since this process becomes computationally more expensive as the system vocabulary size increases, research has long been devoted to reducing the computational cost. Recently, the WFST approach has become an important state-of-the-art speech recognition technology, because it offers improved decoding speed with fewer recognition errors compared with conventional methods. However, it is not easy to understand all the algorithms used in this framework, and they are still in a black box for many people. In this book, we review the WFST approach and aim to provide comprehensive interpretations of WFST operations and decoding algorithms to help nyone who wants to understand, develop, and study WFST- based speech recognizers. We also mention recent advances in this framework and its applications to spoken language processing. Table of Contents: Introduction / Brief Overview of Speech Recognition / Introduction to Weighted Finite-State Transducers / Speech Recognition by Weighted Finite-State Transducers / Dynamic Decoders with On-the-fly WFST Operations / Summary and Perspective

  • Appendix: List of Symbols and Abbreviations

    As the power of computing has grown over the past few decades, the field of machine learning has advanced rapidly in both theory and practice. Machine learning methods are usually based on the assumption that the data generation mechanism does not change over time. Yet real-world applications of machine learning, including image recognition, natural language processing, speech recognition, robot control, and bioinformatics, often violate this common assumption. Dealing with non-stationarity is one of modern machine learning's greatest challenges. This book focuses on a specific non-stationary environment known as covariate shift, in which the distributions of inputs (queries) change but the conditional distribution of outputs (answers) is unchanged, and presents machine learning theory, algorithms, and applications to overcome this variety of non-stationarity. After reviewing the state-of- the-art research in the field, the authors discuss topics that include learning under covariate shift, model selection, importance estimation, and active learning. They describe such real world applications of covariate shift adaption as brain-computer interface, speaker identification, and age prediction from facial images. With this book, they aim to encourage future research in machine learning, statistics, and engineering that strives to create truly autonomous learning machines able to learn under non- stationarity.

  • Back Matter

    This collection of essays by 12 members of the MIT staff, provides an inside report on the scope and expectations of current research in one of the world's major AI centers. The chapters on artificial intelligence, expert systems, vision, robotics, and natural language provide both a broad overview of current areas of activity and an assessment of the field at a time of great public interest and rapid technological progress.Contents: Artificial Intelligence (Patrick H. Winston and Karen Prendergast). KnowledgeBased Systems (Randall Davis). Expert-System Tools and Techniques (Peter Szolovits). Medical Diagnosis: Evolution of Systems Building Expertise (Ramesh S. Patil). Artificial Intelligence and Software Engineering (Charles Rich and Richard C. Waters). Intelligent Natural Language Processing (Robert C. Berwick). Automatic Speech Recognition and Understanding (Victor W. Zue). Robot Programming and Artificial Intelligence (Tomas Lozano-Perez). Robot Hands and Tactile Sensing (John M. Hollerbach). Intelligent Vision (Michael Brady). Making Robots See (W. Eric L. Grimson). Autonomous Mobile Robots (Rodney A. Brooks).W. Eric L. Grimson, author of From Images to Surfaces: A Computational Study of the Human Early Vision System (MIT Press 1981), and Ramesh S. Patil are both Assistant Professors in the Department of Electrical Engineering and Computer Science at MIT. AI in the 1980s and Beyond is included in the Artificial Intelligence Series, edited by Patrick H. Winston and Michael Brady.

  • Index

    Neural Networks for Pattern Recognition takes the pioneering work in artificial neural networks by Stephen Grossberg and his colleagues to a new level. In a simple and accessible way it extends embedding field theory into areas of machine intelligence that have not been clearly dealt with before. Following a tutorial of existing neural networks for pattern classification, Nigrin expands on these networks to present fundamentally new architectures that perform realtime pattern classification of embedded and synonymous patterns and that will aid in tasks such as vision, speech recognition, sensor fusion, and constraint satisfaction.Nigrin presents the new architectures in two stages. First he presents a network called Sonnet 1 that already achieves important properties such as the ability to learn and segment continuously varied input patterns in real time, to process patterns in a context sensitive fashion, and to learn new patterns without degrading existing categories. He then removes simplifications inherent in Sonnet 1 and introduces radically new architectures. These architectures have the power to classify patterns that may have similar meanings but that have different external appearances (synonyms). They also have been designed to represent patterns in a distributed fashion, both in short-term and long-term memory.Albert Nigrin is Assistant Professor in the Department of Computer Science and Information Systems at American University.



Standards related to Speech Recognition

Back to Top

No standards are currently tagged "Speech Recognition"


Jobs related to Speech Recognition

Back to Top