Information Theory
What Is Information Theory?
Information theory is the mathematical study of the fundamental limits of communication, compression, and storage of information. Founded by Claude Shannon in his landmark 1948 paper "A Mathematical Theory of Communication," the field provides rigorous answers to questions that had previously been approached only intuitively: How much information does a source produce? How compactly can data be encoded? How fast can reliable communication occur over a noisy channel? These results underpin virtually every aspect of modern digital communications and data storage.
Entropy and Source Coding
Shannon defined entropy as the measure of uncertainty, or average information content, of a random source. A source that always produces the same symbol has zero entropy; a source that produces symbols uniformly at random has maximum entropy. Shannon entropy is measured in bits when the logarithm base is two, and it quantifies the minimum average number of bits required to encode each symbol from the source without loss.
Source coding, or data compression, is the practice of representing information using as few bits as possible. Shannon's source coding theorem proves that lossless compression cannot reduce a source below its entropy rate, establishing a fundamental bound. Practical algorithms such as Huffman coding and arithmetic coding approach this bound closely; modern lossless compressors such as DEFLATE and Zstandard build on these foundations to achieve compression within a few percent of the theoretical optimum.
Lossy compression sacrifices some fidelity for greater compression ratios, and rate-distortion theory, an extension of Shannon's work, characterizes the trade-off between the number of bits used and the resulting reconstruction quality. JPEG image compression and MP3 audio encoding are practical applications of rate-distortion principles.
Channel Capacity and Shannon Theory
A communication channel is characterized by the noise and interference it introduces between transmitter and receiver. Shannon's channel capacity theorem establishes the maximum rate, in bits per second, at which information can be transmitted through a noisy channel with arbitrarily small error probability. This rate, called the channel capacity, depends only on the statistical properties of the channel and not on the encoding scheme used.
The Shannon-Hartley theorem gives a closed-form expression for the capacity of an additive white Gaussian noise channel as a function of bandwidth and signal-to-noise ratio. Modern wireless standards such as 5G NR push spectral efficiency ever closer to Shannon's theoretical ceiling through techniques including MIMO antenna arrays, adaptive modulation, and sophisticated channel coding.
Maximum Likelihood Decoding and Error Correction
Practical channels introduce errors that must be corrected. Error-correcting codes add structured redundancy to transmitted data so that receivers can detect and correct errors. Maximum likelihood decoding selects the codeword most consistent with the received signal under the assumed channel model, minimizing the probability of decoding error.
Modern error-correcting codes such as turbo codes, low-density parity-check (LDPC) codes, and polar codes achieve performance within a small fraction of a decibel of channel capacity, a remarkable practical realization of Shannon's theoretical limits that took decades of research to achieve. These codes are deployed in 4G/5G cellular networks, deep-space communication links, and solid-state storage devices.
Mutual Information and Applications
Mutual information quantifies the statistical dependence between two random variables, measuring how much knowing one reduces uncertainty about the other. It generalizes correlation to non-linear relationships and plays a central role in channel capacity calculations, feature selection in machine learning, and the analysis of neural information processing.
Applications
- Wireless communications: Channel coding and modulation schemes in 5G and Wi-Fi are designed to approach Shannon capacity over fading radio channels.
- Data compression: Lossless compressors for files, databases, and network protocols apply source coding theory to minimize storage and bandwidth use.
- Cryptography: Entropy measures quantify key strength and randomness quality in cryptographic systems and random number generators.
- Machine learning: Mutual information and cross-entropy loss functions derived from information theory guide model training and feature selection.
- Deep-space communications: NASA's deep-space network uses capacity-approaching LDPC codes to maintain reliable links with spacecraft billions of kilometers away.
- DNA data storage: Information theory provides the framework for encoding digital data in synthetic DNA sequences with error correction for molecular noise.