"Reproducible build" definition in OpenSSF glossary

Thu May 8 16:40:01 UTC 2025

> On May 8, 2025, at 2:21 AM, Pol Dellaiera <pol.dellaiera at gmail.com> wrote:
> 
> Hello,
> Great initiative, it is great to see efforts being made to improve this area. 
> Last year, I completed a Master's thesis on reproducibility in software 
> engineering (https://doi.org/10.5281/zenodo.12666898), and I have also 
> navigated this path myself.

Very interesting, thanks for sharing this! I'll need to take a deeper look at it.

I'm also interested in the broader issues of reproducibility, including the
vary serious threat to scientific progress known as the "reproducibility crisis"
that's been going on for some time.

However, for this specific page, I think we need to focus on reproducible builds
as perceived by most.  The ability for any system to consistently produce
the same results at run-time is interesting, but I think that's a much more general
topic and too far afield for this page. That deserves its own page(s), and
probably should be part of a general scientific set of criteria.

However, your point does suggest we should add some definitions to clarify things. How about:

A **repeatable build** is a build where given the same build inputs,
repeating the build process by the same party in the same
build environment produces the same bit-by-bit artifacts. Implementing repeatable builds
is often a valuable step towards achieving a reproducible build, but a reproducible build
also requires that other parties be able to recreate the artifacts.

(This is borrowed from:
https://www.bestpractices.dev/en/criteria#1.build_repeatable Adding this definition shows a plausible
intermediate step, and makes it clear why being able to repeat on your OWN system
isn't the same thing.)

We don't use the term **deterministic build** because it's not always clear
if "repeatable build" or "reproducible build" is meant by that phrase.
We recommend asking for clarification when "deterministic build" is used.

A **semantically equivalent build** is either a reproducible build or a build where the
difference between the built artifact(s) are not expected to produce
to produce functional differences in normal cases.
For example, the rebuilt artifact might have different date/time stamps,
or one might include files like .gitignore that are not in the other.
The intent is to reduce risk. The challenge with this approach is that "not expected" is a loose
criterion while while bit-by-bit equivalence is easy to determine.
Tools like [OSSGadget OSS-Reproducible](https://github.com/microsoft/OSSGadget/wiki/OSS-Reproducible)
attempt to measure this.

--- David A. Wheeler