Latency
What Is Latency?
Latency is the elapsed time between a stimulus and the corresponding response in a system, whether that system is a network link, a processor, a storage device, or a sensor. In engineering contexts, the term captures delay rather than rate: two systems may transfer data at the same throughput yet differ substantially in how long they take to deliver the first byte of a response. This distinction between bandwidth and latency is critical in real-time applications, interactive services, and control systems where a slow response carries the same consequence as no response.
Latency arises from multiple physical and operational sources. Propagation delay reflects the finite speed at which signals travel through a medium, approximately two-thirds the speed of light in copper or fiber. Transmission delay is determined by the ratio of packet size to link data rate. Queuing delay accumulates when packets wait behind other traffic in router or switch buffers. Processing delay covers the time consumed by header inspection, routing table lookup, encryption, and similar operations. Any end-to-end latency figure is the sum of these components across every link and device in the path.
Network Latency
In packet-switched networks, round-trip time (RTT) is the dominant latency metric: the interval from when a host sends a packet until it receives an acknowledgment from the destination. RFC 2681, the IETF standard defining a round-trip delay metric for IP performance measurement, establishes the precise measurement methodology used by network operators and researchers to characterize path delay. The ping utility, which relies on ICMP echo messages, provides a field-expedient RTT estimate. Traceroute extends this by reporting per-hop delay, identifying where along a path latency concentrates.
Bufferbloat, the condition in which excessively large router queues introduce hundreds of milliseconds of queuing delay even at moderate load, drove significant protocol work in the 2010s. The IETF RFC 9330 specification for the Low Latency, Low Loss, and Scalable Throughput (L4S) architecture addresses this directly by requiring network nodes to signal congestion earlier, keeping queues short and RTTs predictable without sacrificing throughput.
Computational and Storage Latency
Latency in computing systems describes the delay between a request and the delivery of its result. Memory access latency distinguishes DRAM (tens of nanoseconds), NAND flash (tens of microseconds), and spinning disk (several milliseconds), and these differences propagate up through cache hierarchy design, database indexing strategies, and operating system scheduler choices. CPU pipeline latency, measured in clock cycles, determines the instruction-level parallelism a processor can extract from a program.
In storage networks, the SCSI command completion time and NVMe queue depth interact with latency in ways that benchmark tools such as FIO expose. Cloud infrastructure providers publish service-level objectives for storage latency, using the 99th-percentile or p99 figure rather than the mean, because tail latency governs user-perceived performance under heavy load. Research from Google on the impact of tail latency in large-scale distributed systems established the practice of hedged requests, where a client issues duplicate requests to reduce exposure to slow nodes.
Measurement and Reduction
Latency is measured in milliseconds for networks and microseconds or nanoseconds for on-chip and memory subsystems. Hardware timestamping, available in network interface cards that support the IEEE 1588 Precision Time Protocol (PTP), enables sub-microsecond RTT measurement by bypassing software stack delays. Software-based profiling tools, including kernel trace frameworks such as eBPF on Linux, attribute per-operation latency to specific system calls, interrupt handlers, and scheduling events.
Reduction techniques include geographic co-location of services near users (content delivery networks), protocol optimizations that eliminate round trips (TLS 1.3 reduces handshake latency relative to TLS 1.2), hardware offload for cryptographic and parsing operations, and queue management algorithms such as CoDel and FQ-CoDel that hold queuing delay below a configurable target.
Applications
Latency analysis and optimization apply across engineering disciplines, including:
- Real-time control systems in robotics and industrial automation, where control-loop delay affects stability
- Financial trading infrastructure where microsecond advantages drive hardware investment
- Video conferencing and voice over IP, where one-way delay above 150 ms degrades conversation quality
- Autonomous vehicles, where sensor fusion and decision pipelines operate under hard timing deadlines
- Online gaming, where network RTT determines the responsiveness of player interactions