Decorrelation
What Is Decorrelation?
Decorrelation is the transformation of a set of random variables or signals into a new set in which pairwise statistical dependencies, specifically linear correlations, have been removed. Two random variables are uncorrelated when their covariance is zero, meaning that knowing the value of one provides no linear information about the other. Decorrelation is a preprocessing and data-transformation operation that appears widely in signal processing, communications, statistical analysis, and machine learning whenever correlated inputs complicate downstream estimation, compression, or classification.
The concept rests on the covariance matrix of a multivariate data set. When that matrix is diagonal, the variables are pairwise uncorrelated. Achieving this diagonal structure requires a linear transformation, and the appropriate transformation is derived from the eigenvectors of the covariance matrix, which is the same operation that underlies Principal Component Analysis (PCA). The field draws from multivariate statistics, linear algebra, and information theory, with connections to Shannon's source coding theory, which establishes that decorrelated signal components can be encoded independently without loss of compression efficiency.
Statistical Decorrelation and Whitening
Statistical decorrelation produces outputs that are pairwise uncorrelated but does not guarantee statistical independence, which is a stronger property. Whitening, a closely related operation, additionally normalizes each decorrelated component to unit variance, so the resulting covariance matrix is the identity. The tutorial on Independent Component Analysis from Google Research explains the distinction carefully: after whitening, the residual structure in the data is captured entirely by higher-order statistics such as skewness and kurtosis, leaving only the rotation between decorrelated components to be determined by methods like Independent Component Analysis (ICA). This makes whitening an essential preprocessing step for ICA algorithms, which otherwise must jointly optimize both decorrelation and independence.
Principal Component Analysis decorrelates by projecting the data onto the eigenvectors of its covariance matrix, ordering the resulting components by their variance. The first principal component captures the direction of greatest variance, subsequent components capture orthogonal directions of decreasing variance, and all components are by construction uncorrelated. Retaining only the top k components then provides both decorrelation and dimensionality reduction simultaneously.
Decorrelation in Signal Processing and Coding
In signal processing, decorrelation appears as an explicit design objective in source coding and transform coding. Audio and image codecs exploit the correlation structure of natural signals to achieve compression: a transform that concentrates signal energy into a few uncorrelated coefficients allows those coefficients to be quantized coarsely while maintaining perceptual quality. The discrete cosine transform (DCT), used in JPEG image compression and the AAC audio codec, is an asymptotically optimal decorrelating transform for first-order Markov processes, explaining its empirical effectiveness on natural images and speech. The NIST Digital Library of Mathematical Functions provides the formal spectral decomposition results that underpin these transform coding analyses.
In antenna arrays and beamforming, spatial decorrelation refers to reducing correlation between signals received at different array elements or at different beams. Highly correlated signals can degrade the spatial resolution of array processing algorithms such as MUSIC and ESPRIT. Spatial smoothing techniques, described in the fundamentals of PCA and ICA from the University of Edinburgh UDRC, restore effective rank to the signal covariance matrix when coherent multipath causes correlated copies of the same source to arrive from slightly different directions.
Applications
Decorrelation has applications in a wide range of fields, including:
- Image and audio compression using transform coding (JPEG, AAC, H.264)
- Multivariate statistical analysis and dimensionality reduction
- Array signal processing for direction-of-arrival estimation
- Preprocessing for independent component analysis and blind source separation
- Financial portfolio diversification and risk factor modeling