Codes

TOPIC AREA

What Are Codes?

Codes are structured mappings of information symbols onto sequences of bits or other symbols, designed to enable reliable transmission, efficient storage, or secure handling of data. In information theory and communications engineering, a code defines a set of rules for representing source data and, in the error-control setting, for adding redundancy so that a receiver can detect or correct distortions introduced by a noisy channel. The discipline draws on linear algebra, finite field arithmetic, probability theory, and combinatorics, and its theoretical foundations were established by Claude Shannon's 1948 channel capacity theorem, which proved that codes exist capable of transmitting data at rates approaching the channel capacity with arbitrarily low error probability.

Codes divide naturally into two broad families: source codes, which compress data to remove redundancy, and channel codes, which add controlled redundancy to protect data against noise. The articles in the IEEE Technology Navigator focus primarily on channel coding, where the design of codeword structure directly governs the reliability of a communications link.

Linear and Binary Codes

Linear codes are channel codes whose codewords form a vector subspace over a finite field, most commonly the binary field GF(2). The linearity property means that the sum of any two codewords is also a codeword, which greatly simplifies encoding and syndrome-based decoding. Binary codes map data blocks to binary strings, and performance is characterized by the minimum Hamming distance between distinct codewords: a code with minimum distance d can detect up to d-1 errors and correct up to the floor of (d-1)/2 errors. Shannon's foundational paper in the Bell System Technical Journal established the capacity limits within which all linear codes operate. Cyclic codes are an important subclass of linear codes whose codewords are closed under cyclic shifts; this structure enables efficient implementation using shift registers.

Error-Correcting Codes

Error-correcting codes (ECCs) add redundant bits to a message so that a decoder can reconstruct the original data even after some bits are corrupted. Hamming codes, introduced by Richard Hamming in 1950, are a family of perfect single-error-correcting linear codes that achieve the theoretical minimum ratio of redundant bits to correctable errors for that error model. Reed-Solomon codes operate over non-binary fields and correct burst errors in blocks of symbols, making them the dominant ECC in storage media such as CDs, DVDs, and QR codes, as well as in deep-space telemetry. Turbo codes, introduced in 1993, combine two recursive systematic convolutional codes with an interleaver and use iterative belief-propagation decoding to approach Shannon capacity within a fraction of a decibel; the 3GPP specification for turbo coding in LTE standardized the specific parallel concatenated structure used in 4G systems. Convolutional codes, which encode a continuous data stream through a sliding window of shift registers rather than fixed blocks, remain fundamental in satellite and mobile communications.

Advanced Channel Codes

Fountain codes are rateless codes that generate a potentially unlimited supply of encoded symbols from a source block, allowing a receiver to reconstruct the original data from any sufficiently large subset of received symbols regardless of which specific packets are lost. This property makes fountain codes particularly well-suited to broadcast and multicast scenarios over erasure channels, where packet loss is more common than bit corruption. Space-time codes address the multiple-antenna setting: by encoding information across both spatial (antenna) and temporal dimensions, they exploit transmit diversity to improve reliability over fading wireless channels without increasing bandwidth. Cyclic redundancy check (CRC) codes are a specialized class of cyclic codes used primarily for error detection rather than correction; their polynomial structure allows hardware-efficient computation and is mandated by many networking standards, including Ethernet and USB. The NIST Computer Security Resource Center documents the use of CRC and hash-based integrity mechanisms across federal information processing standards.

Applications

Codes have applications in a wide range of disciplines, including:

  • Wireless communications, where turbo and LDPC codes are specified in 4G LTE and 5G NR standards
  • Data storage, where Reed-Solomon codes protect hard drives, optical discs, and flash memory
  • Deep-space telemetry, where convolutional and turbo codes enable reliable communication at extreme distances
  • Network protocols, where CRC codes detect transmission errors in Ethernet, USB, and storage interfaces
  • Broadcast and streaming, where fountain codes support reliable delivery over lossy networks