Big Data Taxonomies
What Are Big Data Taxonomies?
Big data taxonomies are structured classification systems that organize the concepts, components, technologies, and use cases associated with large-scale data management and analytics into a coherent hierarchy. They serve researchers and practitioners by providing consistent terminology for comparing systems, scoping literature reviews, and communicating across disciplinary boundaries where the same technical concepts may carry different names. A taxonomy of big data differs from a definition in that it does not simply characterize what big data is; it maps the relationships among the types of data, the systems that handle it, the analytical methods applied, and the domains where it appears. Standards bodies, industry analysts, and academic research groups have each produced taxonomies suited to their specific audiences.
Taxonomies of Data Types and Sources
One major axis of big data classification concerns the type and origin of the data itself. Structured data, with a fixed schema and tabular organization, sits at one end of the spectrum; unstructured data, such as natural language text, images, audio, and video, sits at the other. Semi-structured data, including JSON documents, XML records, log files, and sensor telemetry, occupies an intermediate position. A second classification dimension concerns the source: transaction systems, social media platforms, scientific instruments, network infrastructure, and IoT devices each produce data with different velocity, volume, and quality characteristics. The NIST Big Data Interoperability Framework uses a two-dimensional approach, mapping data types against the functional roles of the system components that handle them, to provide a taxonomy that supports both technical design and procurement decision-making.
Taxonomies of Analytical Methods
A second major classification axis covers the analytical methods applied to big data. Descriptive analytics summarizes historical data to characterize what has happened. Diagnostic analytics identifies why patterns occurred by examining correlations and causal structures. Predictive analytics applies statistical and machine-learning models to estimate future states or outcomes. Prescriptive analytics extends prediction to recommendation, identifying actions that would lead to desired outcomes. This four-tier taxonomy, common in enterprise data strategy literature, organizes analytical approaches by their relationship to time and decision-making. Within each tier, further sub-classifications exist: machine learning methods divide into supervised, unsupervised, and reinforcement learning; statistical methods divide by distributional assumptions and estimation frameworks. IEEE Xplore publications on big data analytics surveys document how this analytical taxonomy has evolved alongside platform capabilities.
Taxonomies of Big Data Systems and Platforms
System-level taxonomies classify the architectural components and platform types used to build big data infrastructure. Storage systems are classified by their data model (relational, document, graph, key-value, columnar) and their consistency and availability guarantees. Processing frameworks are classified by their execution model (batch, micro-batch, streaming) and their programming paradigm (dataflow, SQL, iterative). Deployment models are classified along dimensions of ownership (public cloud, private cloud, on-premises, hybrid) and management style (self-managed, managed service, serverless). These distinctions matter for procurement, capacity planning, and regulatory compliance. The IEEE Big Data initiative's standards work addresses taxonomy harmonization across these system dimensions, recognizing that incompatible classification systems in different standards bodies complicate interoperability at the policy and procurement level as well as at the technical interface level.
Applications
Big data taxonomies are used in a range of contexts, including:
- Academic literature classification and systematic review methodology
- Enterprise architecture decision frameworks and vendor selection guides
- Government and regulatory framework development for data governance
- Educational curriculum design in data science and information systems programs
- Interoperability standards work, where shared terminology enables cross-vendor compatibility