Encoding

TOPIC AREA

What Is Encoding?

Encoding is the process of transforming information from one representation into another to achieve a specific purpose: reducing the number of bits required for storage or transmission, protecting data against errors introduced by noisy channels, or adapting a signal to the physical characteristics of a communication medium. In information theory, the foundations of which were established by Claude Shannon in 1948, encoding is the bridge between a source and a destination, and its design determines how close a system can come to the theoretical limits of compression and error-free communication.

Modern encoding divides naturally into two complementary branches. Source coding (also called compression) reduces redundancy in the original data, representing it with fewer bits without losing essential information. Channel coding adds controlled redundancy back into the compressed bit stream to detect and correct errors introduced by the transmission medium. These two operations work in sequence in nearly every digital communication and storage system.

Source Coding: Entropy and Huffman Coding

Source coding exploits statistical structure in the data. A source that produces symbols with unequal probabilities can be encoded with fewer average bits per symbol than a uniform source, and the theoretical lower bound on this average is the Shannon entropy of the source distribution.

Huffman coding achieves near-optimal compression by assigning shorter binary codewords to more probable symbols and longer codewords to less probable ones. The algorithm constructs a binary tree greedily from the bottom up, and the resulting code is prefix-free, meaning no codeword is a prefix of another, which makes decoding unambiguous. IEEE work on parallel Huffman decoders shows how hardware implementations parse variable-length Huffman codes at rates compatible with real-time video decompression, a requirement in JPEG, MPEG, and H.264 standards where Huffman coding appears in the entropy-coding layer.

Arithmetic coding and asymmetric numeral systems extend the entropy-coding idea beyond symbol-by-symbol decisions, achieving compression ratios closer to the entropy bound and now appear in modern image and video standards.

Channel Coding: Block Codes and Turbo Codes

Channel coding transforms a block of information bits into a longer codeword by adding parity or check bits according to a mathematical structure. Block codes such as Reed-Solomon codes add redundancy in algebraic patterns that allow a decoder to locate and correct a bounded number of symbol errors. They are the basis for error protection in compact discs, QR codes, and deep-space communications.

Turbo codes, introduced in 1993, achieve near-Shannon-limit performance by concatenating two or more convolutional codes separated by an interleaver and using iterative belief-propagation decoding between the component decoders. IEEE research on joint source-channel turbo decoding of entropy-coded sources demonstrates that soft information shared between the source and channel decoders can recover more data than treating the two stages independently. Turbo codes are specified in third-generation cellular standards and deep-space link protocols.

Low-density parity-check (LDPC) codes achieve similar performance through a sparse parity-check matrix that supports efficient iterative message-passing decoding. They appear in Wi-Fi (802.11n and later), DVB-S2 satellite broadcasting, and 5G New Radio.

Speech and Audio Coding

Speech and audio coding apply source-coding principles specifically to audio signals, exploiting perceptual models to discard information the human auditory system cannot distinguish. IEEE joint source-channel soft decoding of Huffman codes with turbo codes illustrates how audio coding and channel coding interact in systems where compressed audio must survive noisy wireless channels. Codecs such as AAC, Opus, and Adaptive Multi-Rate (AMR) operate at bit rates from 6 kbps for narrowband speech to 256 kbps for transparent music reproduction.

Applications

  • Lossless image compression in medical imaging archives using entropy coding
  • Error correction in flash memory and solid-state drives using LDPC codes
  • Speech compression in cellular telephony using AMR and Opus codecs
  • Deep-space telemetry from planetary probes protected by concatenated Reed-Solomon and convolutional codes
  • Video streaming over variable-quality internet links using adaptive bitrate encoding
  • QR code encoding for reliable scanning under partial damage or obscuration