Replicability

What Is Replicability?

Replicability is the property of a scientific study or experiment whereby independent researchers, using their own data collection and following the same methods described in the original publication, arrive at results consistent with those of the original. The National Academies of Sciences, Engineering, and Medicine define replicability as obtaining consistent results across studies aimed at answering the same scientific question, each of which has obtained its own data. This distinguishes replicability from reproducibility, which concerns whether the same results emerge when the original dataset and code are reused without new data collection.

Replicability is foundational to the scientific method. A finding that cannot be replicated by independent parties is suspect, regardless of the statistical significance or apparent rigor of the original study. Concerns about replicability have grown across disciplines including psychology, biomedical research, machine learning, and electrical engineering, prompting journals, professional societies, and funding agencies to develop formal policies and tools to promote it.

Replicability Versus Reproducibility

The two terms are closely related but are not synonymous, and their definitions have been codified differently across communities. The ACM Publications Board revised its terminology following alignment with the National Academies framework, standardizing that reproducibility involves rerunning computations on the original data, while replicability requires new data collected under equivalent conditions. In experimental sciences, replicability typically means repeating a physical experiment with new samples, new instruments, and new analysts and obtaining similar outcomes. In computational sciences, it means independently implementing the same algorithm from the published description and observing comparable performance on independently gathered datasets. Conflation of the two terms has historically made it difficult to diagnose which aspect of a study has failed, a confusion that the ACM's updated artifact badging program addresses by separately certifying reproducibility and replicability.

Factors Affecting Replicability

Many factors reduce the probability that a study will replicate. Insufficient statistical power, where sample sizes are too small to detect an effect reliably, is among the most common. Publication bias compounds this: positive results are more likely to be submitted and accepted than null results, which inflates the apparent success rate of any given line of inquiry. Incomplete methodological reporting, including omission of preprocessing choices, hyperparameter settings, or data exclusion criteria, prevents independent groups from faithfully reconstructing experimental conditions. In engineering and applied research, variability in materials, test environments, and calibration can introduce differences that appear as failure to replicate even when the underlying phenomenon is real. Recognizing these factors has driven the adoption of pre-registration, open data sharing, and detailed supplementary methods sections in major IEEE and ACM publication venues.

Replicability in Computational Research

Computational experiments present a specific set of replicability challenges that differ from physical experiments. Software dependencies, random seeds, floating-point nondeterminism, and hardware-specific behavior can all cause results to differ across platforms or time. A model trained on a machine learning benchmark may perform differently on a different GPU, or after a framework library is updated, even when the training code is nominally the same. Research on reproducibility and replicability in web measurement studies published through the ACM Digital Library illustrates the complexity of controlling for external factors that change independently of the research methodology. IEEE conferences in machine learning and computer vision have introduced reproducibility checklists and code-submission policies to document the full computational environment, making it easier for subsequent researchers to replicate results under comparable conditions.

Applications

Replicability has applications in a range of fields, including:

  • Experimental physics and materials science, where fabrication variability must be distinguished from genuine effects
  • Clinical trials and biomedical device evaluation, where regulatory approval depends on independent confirmation of safety and efficacy
  • Machine learning benchmarking, where model comparisons require consistent evaluation protocols
  • Software engineering research, where tool evaluations must be repeatable across different codebases and environments
  • Metrology and measurement standards, where calibration procedures must yield consistent values internationally
Loading…