Ontologies
What Are Ontologies?
Ontologies are formal, machine-readable representations of concepts and their relationships within a domain of knowledge. In computer science and information engineering, an ontology defines the classes, properties, and constraints that describe a subject area in a way that software can interpret and reason about. The term was borrowed from philosophy, where it refers to the study of what exists, and adapted to capture structured knowledge in a form computers can process.
The foundations of ontological engineering draw on mathematical logic, particularly description logics, which are decidable fragments of first-order predicate logic. An ontology typically specifies a set of named entities (individuals), the categories they belong to (classes), and the binary relations that hold between them (properties). This formal grounding distinguishes ontologies from informal glossaries or taxonomies and enables automated inference: a reasoning engine can derive facts that are not explicitly stated, detect logical inconsistencies, and answer structured queries over a knowledge base.
The Semantic Web
The Semantic Web, a vision articulated by Tim Berners-Lee and colleagues in the early 2000s, provides the principal deployment context for modern ontologies. Ontologies serve as the interpretive layer that gives raw web data shared meaning. The W3C's OWL (Web Ontology Language) is the primary standard for publishing ontologies on the web; it extends the Resource Description Framework (RDF) and RDF Schema with richer vocabulary and formal logic so that software can infer new facts and check consistency. OWL's description-logic semantics support three profiles with different trade-offs between expressiveness and computational tractability: OWL EL, OWL QL, and OWL DL.
Alongside OWL, the SPARQL query language and RDF triples form the technical stack through which Semantic Web applications locate, retrieve, and integrate knowledge across heterogeneous sources. According to IEEE research on knowledge representation with ontologies, there has been a sustained growth in ontological engineering tied directly to this infrastructure, as organizations seek to make their data interoperable across organizational and disciplinary boundaries.
Linked Data
Linked Data is the practice of publishing structured data on the web using URIs as identifiers and RDF links to connect related resources across datasets. Ontologies are the shared vocabulary that make those links coherent: two datasets that both use the same class definition can be merged, queried, and traversed as if they were one. Large-scale linked-data deployments such as DBpedia, the Linked Open Data cloud, and enterprise knowledge graphs at companies like Google and Amazon rely on ontologies to maintain consistency as data volumes grow.
Ontologies in linked-data contexts also support provenance and trust metadata, recording where information came from and when it was last verified. This traceability is increasingly significant as downstream applications such as recommendation systems and search engines use ranking signals derived from graph-structured knowledge.
Thesauri and Controlled Vocabularies
Thesauri and controlled vocabularies are the historical predecessors of formal ontologies. Library and information science developed thesauri to standardize subject headings and term relationships long before computational ontologies existed. The SKOS (Simple Knowledge Organization System) standard, also a W3C specification, provides a bridge by encoding thesauri, classification schemes, and taxonomies in RDF, making it possible to align legacy controlled vocabularies with modern ontologies. This alignment work is common in digital library projects, biomedical databases, and government information systems.
Biomedical ontologies such as the Gene Ontology and the National Cancer Institute Thesaurus have become the benchmark examples of large, community-maintained formal ontologies, illustrating how ontological infrastructure scales to hundreds of thousands of terms in specialized domains. The Gene Ontology Consortium's foundational work demonstrated that shared ontologies could unify data from distinct experimental databases in a reproducible way.
Applications
Ontologies have applications in a range of fields, including:
- Semantic search, improving retrieval accuracy by resolving query intent to structured concepts
- Open data portals, enabling cross-agency dataset discovery and reuse
- Biomedical research, standardizing gene and protein annotations across databases
- Enterprise knowledge management, linking product catalogs, policies, and documentation
- Natural language processing, providing background knowledge for entity recognition and disambiguation