Out of order
What Is Out of Order?
Out-of-order execution is a processor design technique in which instructions are issued and executed in a sequence determined by data availability rather than by their original program order. The method allows a processor to keep its execution units active by bypassing instructions that are waiting on data and issuing independent instructions ahead of them. In-order processors must stall whenever a needed operand is not yet ready; out-of-order processors detect which instructions have all operands available and schedule those first, increasing effective throughput without changing the visible result of computation.
The technique emerged from work at IBM in the 1960s, most notably Robert Tomasulo's 1967 algorithm for floating-point instruction scheduling on the IBM System/360 Model 91. Tomasulo's scheme introduced the concepts of reservation stations and register renaming that underpin modern out-of-order implementations. Subsequent decades brought the reorder buffer and speculative execution, enabling the technique to be combined with branch prediction in microprocessors from the Intel Pentium Pro onward.
Instruction-Level Parallelism
Out-of-order execution is one of the primary mechanisms for extracting instruction-level parallelism from a sequential instruction stream. A superscalar processor can issue multiple instructions per clock cycle, but only if it can identify independent operations. Data dependencies constrain this freedom: a read-after-write (RAW) hazard requires an instruction to wait for the result of a prior instruction, while write-after-read (WAR) and write-after-write (WAW) hazards arise from reuse of register names in the program's logical representation. An out-of-order processor addresses RAW hazards by queuing dependent instructions in an issue buffer until operands arrive, and resolves false dependencies through register renaming. Measured on real workloads, out-of-order processors rarely sustain an instruction-level parallelism factor above two to three, even though theoretical analyses under ideal conditions suggest much higher values.
Reorder Buffer and Register Renaming
The reorder buffer is the hardware structure that reconciles out-of-order execution with the sequential semantics that software expects. Instructions enter the reorder buffer in program order when they are fetched. They may be executed out of order internally, but their results are committed to the architectural register file and memory only when they reach the head of the reorder buffer and have completed without exception. This in-order commit preserves the illusion of sequential execution and enables precise interrupts, which are required for correct exception handling. Register renaming maps the limited set of architectural registers onto a larger pool of physical registers, allowing the hardware to maintain multiple speculative values for the same logical register simultaneously. Together these structures form the core of implementations found in microarchitectures such as the Intel P6, AMD K7, and Apple's Firestorm designs, as detailed in computer architecture courses at institutions such as MIT CSAIL.
Speculative Execution and Branch Prediction
Out-of-order execution interacts closely with branch prediction. When a conditional branch is encountered, the processor speculates on the outcome and continues fetching and executing instructions along the predicted path. If the prediction is correct, the speculative work is committed and execution proceeds without a stall. If the prediction is incorrect, the reorder buffer is flushed and the instructions on the wrong path are discarded. This speculative window can include tens to hundreds of instructions in a deep pipeline. The 2018 Spectre and Meltdown vulnerability disclosures demonstrated that speculative execution creates observable microarchitectural side channels, prompting extensive redesign of both hardware and operating system memory isolation mechanisms.
Applications
Out-of-order execution techniques are applied across a range of computing contexts, including:
- High-performance CPUs in desktop, server, and mobile processors
- GPU shader pipelines for graphics and general-purpose computation
- Hardware simulation tools modeling processor microarchitecture
- Compiler optimization passes that schedule instructions for pipelined execution
- Security research analyzing microarchitectural timing side channels