Computer architecture

TOPIC AREA

What Is Computer Architecture?

Computer architecture is the science and art of designing the organization, structure, and behavior of computer systems, including the interfaces between hardware components and the instruction sets that software uses to communicate with hardware. It spans decisions at multiple levels of abstraction, from the logic gates of individual functional units to the system-level organization of processors, memory, and interconnects. The field draws on digital circuit design, operating systems research, compiler theory, and applied mathematics to balance competing performance, power, and cost constraints.

The discipline is often divided into instruction set architecture (ISA), which defines the programmer-visible interface between hardware and software, and microarchitecture, which defines how a particular ISA is implemented in hardware. Two processor families can share the same ISA while differing substantially in their internal organization and performance characteristics.

Instruction Sets and Pipeline Architectures

An instruction set architecture specifies the set of operations a processor can execute, the data types those operations accept, the registers available to programmers, and the memory addressing modes. The two dominant ISA families are complex instruction set computing (CISC), exemplified by the x86 architecture used in most personal computers and servers, and reduced instruction set computing (RISC), which underlies the ARM processors found in smartphones and embedded systems. To execute instructions efficiently, modern processors use pipelining: the execution of an instruction is broken into discrete stages (fetch, decode, execute, memory access, write-back), and multiple instructions are in different stages simultaneously, much as an assembly line processes multiple products at once. IEEE Xplore documents decades of pipeline architecture research covering branch prediction, out-of-order execution, and speculative execution.

Cache Memory and Memory Hierarchy

The speed gap between processors and main memory has widened steadily over decades. Cache memory, small and fast on-chip storage organized in multiple levels (L1, L2, and L3), bridges this gap by retaining recently and frequently accessed data close to the execution units. The memory hierarchy extends from registers through cache levels to DRAM main memory and secondary storage, with each level trading speed for capacity. Cache coherence protocols, such as MESI, coordinate the state of cached data across multiple processor cores to ensure that all cores observe a consistent view of memory. IEEE Micro publishes applied research on cache and memory system design covering topics from replacement policies to non-volatile memory integration. Effective cache design is central to achieving the theoretical peak performance of modern processors.

Multicore Processors

A multicore processor places two or more independent processing cores on a single chip, allowing the processor to execute multiple instruction streams in parallel. The transition from single-core to multicore architectures, which accelerated in the mid-2000s as power constraints made further single-core frequency scaling impractical, required software developers to adopt parallel programming models. Symmetric multiprocessing systems treat all cores as peers sharing a common memory address space, while asymmetric designs such as ARM's big.LITTLE architecture pair high-performance cores with energy-efficient cores to optimize for different workload phases. The NIST guidance on multiprocessing and parallel computation informs standards for the security and verification of multicore platforms.

Accelerator and Array Processor Architectures

General-purpose processors are optimized for sequential code with irregular control flow, but many computationally intensive workloads are better served by specialized accelerators. Graphics processing units (GPUs), which organize thousands of simpler cores into arrays suited to data-parallel computation, have become the dominant accelerators for machine learning training. Field-programmable gate arrays (FPGAs) allow hardware logic to be reconfigured after fabrication, making them valuable for signal processing and low-latency inference. Array processors more broadly refer to architectures that apply a single instruction to multiple data elements simultaneously, a model formalized in Flynn's taxonomy as single instruction, multiple data (SIMD). Purpose-built accelerators for specific neural network operations, such as Google's Tensor Processing Unit, represent the current leading edge of this architectural specialization.

Applications

Computer architecture has applications in a wide range of disciplines, including:

High-performance computing for scientific simulation, including climate modeling and molecular dynamics
Mobile and embedded systems design, where power efficiency constrains every architectural choice
Data center infrastructure, through server processor design and memory subsystem optimization
Artificial intelligence and machine learning, via GPU and TPU accelerator architectures
Real-time control systems in automotive and aerospace applications, requiring deterministic execution guarantees