Neural Networks
What Are Neural Networks?
Neural networks are computational models composed of interconnected processing units, called neurons or nodes, that are organized in layers and trained to learn patterns from data. Each connection carries a numerical weight that is adjusted during training so that the network produces desired outputs from given inputs. The field draws from neuroscience, statistics, and optimization theory, and it forms a central pillar of modern machine learning alongside kernel methods and probabilistic graphical models.
The concept originates in the 1943 work of McCulloch and Pitts, who proposed a mathematical model of biological neurons, and was extended by Rosenblatt's perceptron in 1958 and by the backpropagation algorithm formalized in the 1980s. Since the mid-2000s, increases in available training data and computational resources have enabled networks with many hidden layers, a regime known as deep learning, which has produced substantial advances in image recognition, natural language processing, and scientific simulation.
Artificial and Biological Neural Networks
Artificial neural networks are loosely inspired by the structure of biological neural tissue, where neurons communicate through synaptic connections whose strength changes with activity. Biological neural networks, the actual networks of neurons in animal brains and peripheral nervous systems, exhibit properties that artificial models approximate but do not fully replicate: sparse connectivity, precise spike timing, and structural plasticity over the lifetime of an organism. Understanding biological networks informs the design of artificial ones: convolutional architectures, for instance, borrow the principle of local receptive fields observed in the mammalian visual cortex. Research on spiking neural networks attempts a closer correspondence with biological computation by representing information in discrete spike events rather than continuous activation values.
Feedforward and Recurrent Architectures
Feedforward networks pass information in one direction, from input layer through hidden layers to output layer, without cycles. The multilayer feedforward network trained with backpropagation is the most widely deployed architecture and underlies most image classification and regression systems. Recurrent neural networks (RNNs) introduce connections that feed activations backward in time, giving the network a form of memory over sequential inputs. Long short-term memory (LSTM) units, introduced by Hochreiter and Schmidhuber in 1997, address the vanishing gradient problem that makes plain RNNs difficult to train on long sequences. Transformer architectures, which use self-attention instead of recurrence, have largely supplanted LSTMs for language tasks but recurrent models remain in use for real-time signal processing and control.
Graph Neural Networks and Cellular Networks
Graph neural networks (GNNs) extend the feedforward paradigm to data organized as graphs, where nodes carry features and edges encode relationships. A GNN aggregates information from a node's neighbors, updates the node's representation, and repeats this process over multiple rounds, enabling the network to capture structural patterns at multiple scales. Applications range from molecular property prediction to traffic forecasting and social network analysis. Cellular neural networks, introduced by Chua and Yang in 1988, are arrays of locally connected analog cells used primarily in image processing and pattern recognition on hardware with parallel computation requirements.
Neural Network Compression and Hardware
Deploying trained neural networks on resource-constrained devices requires reducing model size and computational cost without unacceptable degradation in accuracy. Compression techniques include weight pruning, which removes connections below a magnitude threshold; quantization, which reduces the numerical precision of weights and activations; and knowledge distillation, where a smaller student network is trained to replicate the outputs of a larger teacher. Dedicated neural network hardware accelerators, such as Google's Tensor Processing Unit and various FPGA-based designs, exploit the data-parallel structure of matrix multiplication to deliver high throughput at lower power than general-purpose CPUs and GPUs. The radial basis function network is a specialized feedforward architecture whose hidden units compute distances from learned prototype points, offering faster training and interpretable geometry compared with standard multilayer perceptrons for certain classification problems.
Applications
Neural networks have applications in a wide range of fields, including:
- Computer vision: image classification, object detection, and medical image analysis
- Natural language processing: machine translation, document summarization, and conversational systems
- Scientific computing: protein structure prediction, climate modeling, and materials property estimation
- Control systems: model predictive control, robotic motion planning, and adaptive signal processing
- Audio processing: speech recognition, speaker verification, and music generation