Null value

What Is Null Value?

A null value is a special marker used in computing and database systems to indicate that a data item has no assigned value, that the value is unknown, or that the concept of a value does not apply in a given context. It is distinct from zero, the empty string, or any other default: null represents the absence of information rather than a specific datum. The concept appears across relational databases, programming languages, and type systems, and its precise semantics vary enough between these contexts that understanding null in one domain does not automatically transfer to another.

The idea was introduced to relational databases by Edgar F. Codd as part of the relational model in the 1970s to handle the real-world situation where some attributes of a record are unknown or inapplicable. In programming, C.A.R. Hoare introduced null references in 1965, a decision he later described as his "billion-dollar mistake" given the cumulative cost of null dereference errors across the industry.

Null in Relational Databases

In SQL and relational database systems, null is governed by three-valued logic: every logical predicate evaluates to true, false, or unknown. When a comparison involves a null value, the result is unknown rather than false, because the actual value is not known. As the Modern SQL reference on three-valued logic explains, this behavior has been embedded in the SQL standard from its earliest versions and creates important practical implications. A WHERE clause filters out rows whose predicate evaluates to unknown, so a query written as WHERE column = NULL will return no rows: the IS NULL operator must be used instead. As Microsoft's SQL Server documentation on nullability and three-value logic comparisons notes, NULL comparison semantics apply uniformly across SQL predicates, including in CLR integration contexts. Aggregate functions such as COUNT, SUM, and AVG also ignore null values by default, which can silently skew statistical results if null is not explicitly handled. The three-valued logic system has been a point of ongoing debate among database theorists, with critics arguing it complicates query semantics and proponents arguing it correctly reflects the reality that absence of data is not the same as a known value.

Null in Programming Languages

In imperative and object-oriented programming, null (or nil, or None in Python) typically represents a pointer or reference that does not point to a valid object in memory. Dereferencing a null pointer is a runtime error that terminates execution, and null pointer exceptions have historically been among the most common categories of software defect. In Java, any object reference can be null by default, which means callers must defensively check every value they receive. C and C++ allow null pointers in both safe and undefined-behavior contexts, with dereferencing a null pointer in C producing undefined behavior.

Null Safety

Modern type systems address the null problem by encoding nullability explicitly. Kotlin distinguishes between nullable types (String?) and non-nullable types (String) at compile time, so the compiler rejects any operation on a nullable value that is not first checked for null. Rust eliminates nullable references entirely in safe code: where optional values are needed, the Option enum wraps them as Some(T) or None, and the compiler enforces exhaustive handling of both cases. As described in Meta Engineering's analysis of retrofitting null safety onto Java at scale, adding flow-sensitive null tracking to Instagram's Java codebase reduced production null dereference crashes by 27 percent over 18 months. The JSpecify project is working to standardize nullness annotations across Java tooling to enable this kind of analysis at industry scale.

Applications

Null value handling has applications in a range of fields, including:

  • Relational database schema design and query correctness
  • Data integration pipelines where missing values must be imputed or flagged
  • Distributed systems where network timeouts produce absent rather than erroneous responses
  • Statically typed programming languages with compile-time null safety guarantees
  • Data science and machine learning preprocessing for missing data treatment

Related Topics

Loading…