Cache storage
What Is Cache Storage?
Cache storage is a high-speed data buffer that sits between a processor and slower main memory, holding copies of recently or frequently accessed data so that the processor can retrieve them without waiting for the full latency of a main-memory read. The term derives from the French word for a hidden stock of supplies, and the analogy holds: a cache keeps a working set of information close at hand so the processor rarely has to go looking farther. Cache storage is implemented almost universally in static random-access memory (SRAM), which is faster and more power-hungry than the dynamic RAM (DRAM) used for main memory. The technology underpins every modern general-purpose processor, graphics unit, and embedded controller.
The fundamental justification for cache storage is the principle of locality. Programs tend to access the same memory locations repeatedly over short intervals (temporal locality) and to access memory addresses that are physically close to one another (spatial locality). A cache exploits both patterns by fetching not just a single requested byte but an entire cache line, typically 64 bytes on contemporary processors, and by retaining lines that have been accessed recently in the expectation that they will be needed again soon.
Memory Hierarchy
Cache storage occupies a specific position in the memory hierarchy, a layered organization of storage technologies ranked by speed, cost per bit, and capacity. At the top sit processor registers, which are the fastest and smallest. Below registers come the on-chip cache levels. Main memory (DRAM) sits below that, and secondary storage such as solid-state drives or hard disks occupies the bottom. Each tier is orders of magnitude slower than the tier above it. The Cornell CS 3410 notes on caches describe the canonical tradeoff: on-chip memory is fast, small, and expensive, while off-chip main memory is slow, large, and cheap. Cache storage bridges this gap, making a system behave as though it has nearly the speed of SRAM at nearly the capacity of DRAM.
Cache Levels and Organization
Contemporary processors implement multiple cache levels, designated L1, L2, and L3. L1 cache is the smallest and fastest, integrated directly into each processor core and often split into separate instruction (L1I) and data (L1D) caches. L2 cache is larger and slightly slower, typically shared by one or two cores. L3 cache is the largest on-chip tier and is shared across all cores on a chip. A 2020s-era server processor may carry hundreds of megabytes of L3 cache. Cache lines are managed by a set of hardware structures, including tag arrays that track which main-memory addresses are currently resident, dirty bits that indicate whether a line has been modified, and valid bits that indicate whether the data is usable. Cache sets group lines into associativity structures. A direct-mapped cache is the simplest organization; a fully associative cache is the most flexible but most expensive. Most production caches use set-associativity as a practical middle ground, as detailed in the OpenStax Introduction to Computer Science chapter on memory hierarchy.
Cache Replacement and Coherence
When a cache is full and a new line must be brought in, a replacement policy determines which existing line is evicted. The least-recently used (LRU) policy is the most common; pseudo-LRU approximations reduce hardware cost on large caches. In multicore and multiprocessor systems, multiple caches may independently hold copies of the same memory location. Cache coherence protocols, such as MESI (Modified, Exclusive, Shared, Invalid), ensure that all copies remain consistent. The complexity of coherence grows with core count, which is a major design consideration in modern multi-socket server architectures.
Applications
Cache storage has applications in a wide range of computing contexts, including:
- General-purpose CPUs, where L1 through L3 caches reduce average memory access time by factors of ten or more
- Graphics processing units (GPUs), which use texture caches and shared memory to accelerate parallel workloads
- Embedded microcontrollers, where small tightly coupled memories (TCMs) act as deterministic software-managed caches
- Database and in-memory computing systems, which employ software-level caches for frequently queried data sets
- Network switches and routers, which cache routing table lookups to sustain line-rate packet forwarding