"Reproducible build" definition in OpenSSF glossary
Pol Dellaiera
pol.dellaiera at gmail.com
Thu May 8 06:21:07 UTC 2025
Hello,
Great initiative, it is great to see efforts being made to improve this area.
Last year, I completed a Master's thesis on reproducibility in software
engineering (https://doi.org/10.5281/zenodo.12666898), and I have also
navigated this path myself.
In the thesis, I compiled a list of definitions that may be helpful to you. I
aimed not only to identify key terms but also to formalise them as clearly as
possible.
Building on your work, it might be valuable to explicitly include notions of
reproducibility at both *build time* and *runtime*, don't you think?
Additionally, your definitions could benefit from incorporating the concepts
of "space" and "time" as introduced in this paper: (https://arxiv.org/abs/
2402.00424). They offer a useful framework for thinking about reproducibility
in distributed and temporal contexts.
Thanks again for pushing this forward!
On jeudi 8 mai 2025 0 h 36 min 23 s heure d’été d’Europe centrale David A.
Wheeler via rb-general wrote:
> > On May 7, 2025, at 5:25 PM, Simon Josefsson <simon at josefsson.org> wrote:
> >
> > "David A. Wheeler via rb-general"
> >
> > <rb-general at lists.reproducible-builds.org> writes:
> >> My thanks to the many who commented on the need to update the
> >> definition of "reproducible builds".
> >>
> >> I created a merge request that *attempted* to address all the comments:
> >> https://salsa.debian.org/reproducible-builds/reproducible-website/-/
merge_r
> >> equests/178/diffs>
> > I read it and I'm happy with everything except this part:
> > A build is **reproducible** if given the same build inputs, any party
> >
> > ^^^^^^^^^^^^^^^^^
> >
> > can recreate bit-by-bit identical copies of all specified build
> > artifacts by generating them from the build inputs.
> >
> > First, the term "build inputs" is not defined (as far as I can tell), so
> > I'm not sure exactly what you want it to mean?
>
> Fair enough.
> Originally this was "source code" but we're trying to deal with the case
> where people are building whole container images / ISO images / etc.
> Our effort to generalize things unintentionally created some confusion.
>
> So - how about adding this?:
>
> Build inputs: Data used and processed by the build environment (including
> the tools in the build environment) to produce the output build artifacts.
> The build inputs are often the source code being built.
>
> Build environment: The set of hardware and software used to perform a build
> that accepts the build inputs and generates the artifacts.
> The build environment often includes a compiler, run-time library, operating
> system, and hardware used to execute them.
>
> > Second, I don't think we want to give the impression that the exact same
> > build inputs are required for a reproducible build. What I believe
> > matters are the outputs: if I compile a binary using GCC version X and
> > get the same bit-by-bit identical output as someone with GCC version Y,
> > then I would count that as a success.
>
> If X & Y aren't exceedingly close, I would also count that as a miracle :-).
> I *DO* agree with you that compiling with slightly different tool versions
> and getting the same result is fine. It often doesn't happen, but it's fine!
>
> However, I would consider the tools used during a build as
> part of the build *environment*, and NOT as the build inputs.
> The definition of "relevant attributes of the build environment" was already
> there and I believe hinted at that. But we can do better than *hinting* at
it.
>
> See my definitions above. Hopefully they clarify all that.
> I *think* the problem disappears with these terms more clearly defined.
>
> > Therefor I suggest changing the
> >
> > above into:
> > A build is **reproducible** if any party can recreate bit-by-bit
> > identical copies of all specified build artifacts.
>
> I'm wary of that change because the word "recreate" is doing a lot of hidden
> heavy lifting. In addition, this version downplays the importance of
> generating results from the inputs.
>
> I think we'd agree that "recreating" artifacts by copying the artifacts from
> some other website doesn't count :-). I'd like to make it
> clear that you have to rebuild from the *inputs* - it doesn't count if
> you sneakily copy artifact results from somewhere else.
> In *practice* you have to know a lot about the inputs, so I think it's
> valuable to make it clear that they're important.
>
> --- David A. Wheeler
More information about the rb-general
mailing list