Hamming Distance

What Is Hamming Distance?

Hamming distance is a metric defined between two strings of equal length that counts the number of positions at which the corresponding symbols differ. Named after mathematician and computer scientist Richard W. Hamming, who introduced the concept in his foundational 1950 paper on error-detecting and error-correcting codes, the measure quantifies the minimum number of single-character substitutions required to transform one string into the other. For binary strings, a Hamming distance of 1 means the strings differ in exactly one bit; a distance of 3 means three bit positions carry different values. The metric is fundamental to information theory, coding theory, and several areas of computer science and electrical engineering.

Hamming's work arose directly from practical experience with punched-card computing systems at Bell Labs, where read errors were common and there was no automatic way to detect or correct them. The solution he devised attached redundant parity bits to data words so that single-bit errors could be located and corrected, a technique whose effectiveness depends entirely on the distance properties of the code.

Mathematical Definition

For two strings $x$ and $y$ of length $n$ over an alphabet, the Hamming distance $d_H(x, y)$ equals the cardinality of the set of positions where $x$ and $y$ differ. Over binary alphabets, this is equivalent to the number of 1-bits in the bitwise exclusive-OR of the two strings. The metric satisfies the standard axioms of a distance function: non-negativity, identity of indiscernibles, symmetry, and the triangle inequality. These properties make Hamming distance a proper metric on the space of fixed-length strings and give it a clean geometric interpretation in terms of a Hamming space or binary hypercube. The maximum possible Hamming distance between two binary strings of length $n$ is $n$, occurring when the strings are bitwise complements of each other. IBM's technical overview of Hamming distance in coding and classification tasks describes both the formal definition and practical calculation procedures.

Role in Error-Correcting Codes

The practical importance of Hamming distance lies in its direct relationship to the error-correction capability of a code. A block code with minimum Hamming distance $d_{min}$ between any two codewords can detect up to $d_{min} - 1$ single-bit errors and correct up to $\lfloor(d_{min} - 1)/2\rfloor$ single-bit errors. Hamming's original [7,4] code, described in his 1950 paper republished in Bell System Technical Journal, achieves a minimum distance of 3, allowing correction of any single-bit error in a 7-bit codeword. Extended Hamming codes, Reed-Solomon codes, BCH codes, and modern turbo and LDPC codes all build on the principle that increasing minimum distance between codewords improves the ability to distinguish a received (possibly corrupted) word from its intended codeword. In communications systems governed by NIST guidelines on error control coding, the selection of a code with appropriate minimum Hamming distance is a central design decision.

Computational Methods

Computing the Hamming distance between two binary strings requires only a bitwise XOR followed by a population count, both operations natively supported in modern processor instruction sets. The x86 POPCNT instruction and GCC's __builtin_popcount intrinsic return the number of set bits in a word in a single clock cycle, making distance computation negligible overhead in practice. For large-scale applications such as nearest-neighbor search over high-dimensional binary codes in image retrieval or biometric matching, specialized algorithms exploit SIMD parallelism and lookup tables to compare millions of code pairs per second.

Applications

Hamming distance has applications across a range of fields, including:

  • Telecommunications: forward error correction in digital communication systems
  • Storage systems: error detection and correction in DRAM and flash memory
  • Biometrics: comparing binary iris codes and fingerprint hashes
  • Computer vision: descriptor matching in binary feature descriptors such as BRIEF and ORB
  • Cryptography: measuring key differences and analyzing cipher diffusion properties
Loading…