Acoustic signal processing
What Is Acoustic Signal Processing?
Acoustic signal processing is a field of signal processing concerned with the capture, analysis, transformation, and synthesis of signals that represent sound. It applies mathematical methods from digital signal processing, linear systems theory, and statistical estimation to problems that arise when audio enters a microphone, travels through a medium or a digital system, and must be rendered useful for communication, recognition, or reproduction. The field spans the full audio frequency range of 20 Hz to 20 kHz for human hearing and extends into ultrasound for specialized applications such as medical imaging and sonar.
The discipline draws from a long line of work in telephony, radio broadcasting, and hearing science, and today intersects with machine learning, communications engineering, and psychoacoustics. The IEEE Transactions on Audio, Speech and Language Processing is the primary journal for archiving peer-reviewed advances across the full scope of the field.
Speech and Audio Processing
Speech and audio processing covers the techniques used to encode, enhance, classify, and reconstruct spoken language and general audio signals. Speech compression algorithms such as Code-Excited Linear Prediction (CELP), which underlies codecs including G.729 and AMR, reduce the bit rate of telephone-quality speech from 64 kbps to 8 kbps or less by modeling the vocal tract as a time-varying linear filter driven by an excitation source. Audio coding standards such as the Advanced Audio Coding (AAC) format exploit psychoacoustic masking: the encoder discards spectral components that the human auditory system would not perceive even if they were present. Speech enhancement algorithms, including Wiener filtering and minimum mean-square error spectral subtraction, suppress background noise while preserving the intelligibility and naturalness of the target speech signal.
Echo Cancellation and Active Noise Reduction
When a loudspeaker and a microphone share the same enclosure, such as in a speakerphone or a voice assistant device, the sound from the loudspeaker is picked up by the microphone and fed back into the audio processing chain as an acoustic echo. Acoustic echo cancellation (AEC) uses an adaptive filter to estimate the acoustic path from loudspeaker to microphone and subtract the estimated echo from the microphone signal in real time. The ICASSP Acoustic Echo Cancellation Challenges organized by the IEEE Signal Processing Society have driven benchmark progress in this sub-area since 2021, with successive editions adding tracks for personalized echo cancellation and multichannel microphone arrays. Active noise reduction (ANR) takes a related but distinct approach: it synthesizes an anti-noise signal that destructively interferes with an unwanted sound field at the listener's ear, and is used in consumer headphones and aircraft cabin noise reduction systems.
Speech Synthesis and Recognition
Speech synthesis, or text-to-speech (TTS), converts written text into an intelligible spoken waveform. Early systems concatenated recorded phoneme segments; current systems based on neural sequence-to-sequence models such as WaveNet and Tacotron generate smooth, natural-sounding speech directly from character or phoneme sequences. Speech recognition, the inverse problem, maps an acoustic waveform onto a sequence of words. Modern automatic speech recognition (ASR) systems are built on deep neural networks trained on thousands of hours of transcribed audio, and reach word error rates below 5% on clean broadcast speech. Speech language processing extends recognition into understanding: parsing the recognized words into structured representations that downstream applications such as virtual assistants and medical transcription systems can act upon.
Applications
Acoustic signal processing has applications in a wide range of disciplines, including:
- Voice communications and conferencing systems, including noise suppression and echo control
- Virtual assistants and smart speakers relying on speech recognition and synthesis
- Hearing aids and cochlear implant signal processing
- Sonar systems for underwater object detection and ranging
- Medical ultrasound imaging and diagnostic acoustics