High Performance Computing

TOPIC AREA

What Is High Performance Computing?

High performance computing (HPC) refers to the use of parallel processing systems, specialized hardware, and optimized software to solve computational problems that exceed the practical capacity of conventional workstations or servers. The field spans cluster architectures, distributed memory programming models, accelerator hardware, and the software stacks that bind them together. HPC systems are used wherever scientific, engineering, or data-intensive workloads demand more compute, memory bandwidth, or floating-point throughput than a single processor can deliver.

The field has evolved through several architectural generations, from vector supercomputers of the 1970s and 1980s through massively parallel processors in the 1990s to today's heterogeneous clusters combining multicore CPUs with GPU accelerators. Performance is measured in floating-point operations per second (FLOPS), and the leading systems are tracked biannually by the TOP500 project, which has maintained this ranking since 1993 and serves as a standard reference for the state of high-end computing worldwide.

Parallel Processing and MPI

The foundation of HPC programming is parallelism: distributing work across many processing units to reduce wall-clock time. Distributed-memory systems, in which nodes communicate over a network rather than sharing main memory, are programmed primarily through the Message Passing Interface (MPI). MPI provides a portable API for point-to-point and collective communication, allowing algorithms to exchange data explicitly across node boundaries. The MPI Forum, a broad consortium of vendors and researchers, maintains the MPI standard, now at version 4.1, which adds features for large-scale fault tolerance and improved one-sided communication.

Shared-memory parallelism within a node is addressed by OpenMP, which uses compiler directives to parallelize loops and regions across CPU cores. Real workloads combine both: MPI for inter-node communication and OpenMP or threading libraries for intra-node parallelism.

GPU Computing and Accelerated Architectures

Graphics processing units entered scientific computing in the mid-2000s when researchers recognized that their highly parallel architectures, originally designed for rasterization, could accelerate dense linear algebra and Monte Carlo simulations by factors of ten to one hundred relative to CPUs of the era. Modern GPU accelerators contain thousands of cores and deliver multi-teraflop performance on single-precision and double-precision arithmetic.

NVIDIA's CUDA programming model and the open-standard OpenCL framework give programmers access to GPU parallelism. Deep learning workloads, which dominate AI research, are almost exclusively GPU-accelerated, driving continuous architectural refinement in tensor throughput and memory bandwidth. The DOE Exascale Computing Project has developed a portfolio of applications and libraries targeting leadership-class GPU-accelerated systems, including Frontier at Oak Ridge National Laboratory, which became the first confirmed exascale machine in 2022.

Exascale Computing

Exascale computing denotes systems capable of at least one exaflop (10^18 floating-point operations per second) of sustained performance on scientific workloads. Achieving this milestone required advances across the full system stack: energy-efficient processor designs, high-bandwidth memory, low-latency interconnects, and programming models that expose sufficient parallelism without overwhelming developers. Power consumption is a central constraint, with leading exascale systems drawing tens of megawatts.

Scientific applications enabled by exascale include whole-heart cardiac electrophysiology simulations at cellular resolution, global climate models at sub-kilometer grid spacing, and ab initio nuclear structure calculations that were previously infeasible. Application readiness for exascale is described in the International Journal of High Performance Computing Applications, which publishes performance results from major pre-exascale and exascale campaigns.

HPC Clusters and Interconnects

HPC clusters aggregate commodity or semi-custom nodes into systems that behave as unified computing environments. High-performance interconnects, including InfiniBand and proprietary fabrics such as HPE's Slingshot and Cray's Aries, provide the low-latency, high-bandwidth communication that distributed-memory applications require.

Applications

HPC supports computationally intensive work across many disciplines:

  • Weather forecasting and numerical climate modeling at national meteorological centers
  • Drug discovery through molecular dynamics simulation and virtual screening
  • Structural analysis in aerospace and automotive engineering
  • Seismic imaging for oil and gas exploration
  • Fusion energy research using magnetohydrodynamic simulation
  • Training large language models and other deep neural networks at scale

Topics in this Area