"Reproducible build" definition in OpenSSF glossary

Thu Apr 24 00:56:23 UTC 2025

fosslinux via rb-general <rb-general at lists.reproducible-builds.org> wrote:
> Absolutely agreed! So let's have a definition that clearly defines top-down work as progress toward reproducibility. :D

I am happy to see all sorts of progress toward reproducible
distributions, whether it involves compiling from source code, or
otherwise.

Perhaps the definition of a "pure function" from mathematical computer
science can help:

  https://en.wikipedia.org/wiki/Pure_function

I think that we want most or all of the tools used to build a release to
be pure functions, in the sense that when run with the same inputs, they
produce the same outputs.  This property is independent of whether the
tool inputs are "source code" or not.  So Roland Clobus's improved tool
for building a bootable ISO from a collection of files could be "pure",
even if its input is not source code.

(If the tool pulls its collection of files from an uncontrolled source
out on the network, then it can't be pure, since somebody else could
change those files elsewhere, causing the tool's output to change.)

An stricter property might be that a program is "portably pure" when it
produces the same outputs from the same inputs, despite being run on a
variety of processor types, operating systems, etc.  Many of the
standard tools on GNU and UNIX systems are designed to be portably pure,
and some achieve that.

Note that reproducible distributions can use tools that aren't portably
pure, since we only require the tool to work purely, within the
environment of the distribution itself.  E.g. the output could depend on
the word-size of the system it's running on (e.g. 32-bit or 64-bit), and
still the distribution could be reproducible.  You may not be able to
cross-build the distribution from some disparate host system, but you
could reproducibly rebuild it on itself.

(The Wikipedia definition uses the C- or C++-language definition of
"function", but we can generalize that to the properties of an entire
executable program that one might run from a shell.  One issue with
executable programs that read files in ordinary UNIX file systems is
that they change the time-last-accessed of the files as a side effect.
Their execution also leaves log records in accounting files and such.
In order to be considered a pure function, within the context of a
process such as the build of a distribution, those side effects "must
not matter" to the overall process that we are contemplating.  E.g. if
those files are later copied by "tar" or "genisoimage" to make a
release, then the build scripts must take care that the access-time is
not copied into the output file.)

	John