Open Data
What Are Open Data?
Open data are datasets or data collections that any person may freely access, use, redistribute, and share, subject at most to requirements of attribution and share-alike licensing. The term applies to information released by governments, research institutions, intergovernmental organizations, and private entities that voluntarily publish records without the access barriers, proprietary licenses, or fee structures that restrict conventional data products. Open data practice draws on legal, technical, and organizational principles, and its effective implementation depends on standardized formats, clear licenses, and stable infrastructure for hosting and discovery.
The concept gained institutional traction in the early 2000s alongside the broader open-source and open-access movements. Governments recognized that public-sector information, collected at taxpayer expense, held significant reuse value for researchers, businesses, and civil society. International frameworks including OECD guidelines on public-sector information and the G8 Open Data Charter of 2013 established principles for how governments should make data available, emphasizing machine readability, timeliness, and non-discrimination.
Openness Standards and Licensing
The Open Knowledge Foundation's Open Definition specifies the conditions under which a dataset qualifies as open: the data must be freely redistributable, derivable, and non-discriminatorily accessible, and any license restrictions must not exceed attribution and share-alike requirements. Creative Commons CC0 and CC BY are the licenses most commonly applied to open datasets; CC0 places content in the public domain with no obligations, while CC BY requires acknowledgment of the source.
Tim Berners-Lee developed the 5-star open data deployment scheme to provide a progressive quality framework. At one star, data is on the web under an open license regardless of format; at five stars, data is published using URIs as identifiers and linked to other datasets via RDF, enabling traversal across the web of data. Most government open data portals in current operation reach three or four stars, publishing data in open, machine-readable formats such as CSV or JSON but stopping short of full linked data deployment.
Linked Data and Technical Infrastructure
Linked data is the technical realization of the upper end of the 5-star scheme. By assigning HTTP URIs to individual data entities and expressing relationships using the Resource Description Framework, publishers enable datasets from different sources to be queried together as if they formed a single knowledge graph. The W3C's 5-star Linked Data guidance was developed by the Government Linked Data Working Group specifically to support public-sector adoption of these techniques.
Open government datasets published as linked data have enabled applications that cross institutional boundaries: geographic data from national mapping agencies combined with statistical data from census bureaus, or health outcome data from public health agencies linked to demographic data from welfare registries. The interoperability gains depend on shared ontologies that define what the data fields mean and how entities relate, which is why ontology development and open data are closely connected areas in semantic web practice.
Applications
Open data has applications in a range of fields, including:
- Government transparency, enabling citizens and journalists to audit public spending and policy outcomes
- Scientific research, providing free access to observational datasets in climate science, genomics, and economics
- Urban planning, powering transit apps, property analytics, and infrastructure monitoring
- Public health surveillance, aggregating disease incidence data across national reporting systems
- Electronic publishing, allowing publishers to link bibliographic records to open authority files and research data repositories