Sequences
What Are Sequences?
Sequences are ordered collections of symbols, numbers, or bits in which position carries meaning. Unlike sets, sequences allow repetition and their order is significant: the sequence (1, 0, 1) differs from (0, 1, 1) even though both contain the same elements. Across engineering and computer science, sequences arise in data transmission, cryptography, genomics, and signal processing, each domain placing different demands on the statistical and algebraic properties of the sequences it uses.
The mathematical study of sequences connects combinatorics, number theory, and algebra. Applied work, particularly in communications and biological computing, draws on this theory to design sequences with specific correlation, randomness, or alignment properties.
Binary and Pseudorandom Sequences
Binary sequences assign each position one of two symbols, typically 0 and 1. Their simplicity makes them foundational in digital communications, where bit streams must be transmitted reliably over noisy channels. A key design objective is low autocorrelation outside the zero-lag peak, which allows a receiver to synchronize to the stream without ambiguity.
Pseudorandom sequences, often called pseudonoise (PN) sequences, are deterministically generated but statistically resemble truly random bit streams. Linear feedback shift registers produce a widely used family: maximal-length sequences (m-sequences) with period 2^n - 1 for an n-stage register. These sequences underpin spread-spectrum communications and are described in detail by NIST Special Publication 800-22, which also provides statistical tests for evaluating randomness.
Gold sequences and Kasami sequences extend the m-sequence framework to provide sets of sequences with bounded cross-correlation, a property critical when multiple users share a channel simultaneously, as in code-division multiple access (CDMA) systems.
Random Sequences and Statistical Properties
Truly random sequences are produced by physical processes such as thermal noise or radioactive decay. Their value lies in unpredictability: cryptographic key generation, Monte Carlo simulation, and secure communication all depend on sequences that no adversary can reproduce. NIST's randomness testing suite provides a standard battery of tests that quantify whether a sequence meets randomness requirements for cryptographic use.
Statistical properties of interest include the frequency of each symbol, run-length distribution, serial correlation across multiple lags, and spectral flatness. A sequence that passes all standard tests is said to be statistically random, even if it was generated by a deterministic algorithm.
Coding Sequences
In molecular biology and bioinformatics, coding sequences are segments of DNA or RNA that encode protein structure. A coding sequence begins with a start codon and ends with a stop codon, with the intervening triplets specifying amino acids. Identifying coding sequences within a genome is a fundamental step in annotation and requires computational tools that recognize patterns, splice sites, and reading-frame conservation across related species.
Research published in Nucleic Acids Research covers algorithms for coding sequence prediction, database resources for annotated genomes, and methods for detecting novel genes in metagenomic data.
Sequence Alignment
Sequence alignment finds the best correspondence between two or more sequences by inserting gaps and scoring matches, mismatches, and indels. Global alignment (Needleman-Wunsch) attempts to align sequences end-to-end, while local alignment (Smith-Waterman) identifies the highest-scoring sub-region. Heuristic tools such as BLAST accelerate database search at the cost of guaranteed optimality, enabling practical alignment of millions of sequences.
In engineering, sequence alignment methods transfer to time-series comparison, speech recognition, and network packet analysis, where dynamic programming kernels identify repeated or shifted patterns efficiently.
Applications
Sequences are central to a broad range of technical fields:
- Wireless communications: spread-spectrum and CDMA systems use pseudorandom sequences to separate users and resist interference.
- Cryptography: cryptographically secure random sequences protect key generation and one-time-pad encryption.
- Genomics: sequence alignment and coding-region identification underpin genome assembly, variant calling, and drug target discovery.
- Testing and verification: pseudorandom test vectors exercise digital circuits and expose design faults.
- Signal synchronization: correlation properties of binary sequences enable precise timing recovery in radar and GPS receivers.