"Reproducible build" definition in OpenSSF glossary

Wed Jul 2 08:36:16 UTC 2025

James Addison <jay at jp-hosting.net> writes:

>> > I think it's pretty much impossible to avoid people 'maliciously'
>> > misrepresenting their product as 'reproducible' when it really isn't -
>> > that doesn't seem like something tweaking the definition can fix. I do
>> > think it's still useful to have precise terms so we can make the
>> > distinctions clear, though.
>> >
>> > I agree "Reproducible Builds" as a whole means "from source to binary".
>>
>> That would make Debian installer CDs impossible to call reproducible,
>> since they are built from binaries for which we do not have source code.
>>
>> [ ... snip ... ]
>
>
> Does this refer to binary firmware specifically?

I don't know.

There are other corner cases: some existing binary debian packages where
built using earlier versions of debian packages, and this recurse back
to old versions of debian packages, some of them may have never even
been in any official debian release.  Some necessary package may be
ancient and removed even from archive.debian.org and only exists on
snapshot.debian.org.  Both systems have policies in place to remove
packages when various issues are identified (see
https://snapshot.debian.org/removal/ for list), so these are known to
not be complete historic records of what was ever published.  Some of
those no longer public packages may be necessary to rebuild some old
package that in turn through some chain of build dependency is needed to
rebuild what we use today.

I don't know if anyone did a transitive trace of what packages are
necessary to rebuild all of modern Debian, does anyone know?  I know of
https://rebuilder-snapshot.debian.net/ which was an effort to publish
all packages necessary to rebuild a 'stable' release, but it didn't
include the transitive closure of that set of packages.

This analysis needs to be done for each architecture too.  I fear even
this analysis will be insufficient: when bootstrapping a new
architecture, I think people historically have created fake packages
used to boostrap things.  So you can't rely fully on the PACKAGE:VERSION
to refer to the package that was actually used.

Another problem is that the PACKAGE:VERSION mapping used by Debian
packages does not easily or uniquely map to a strong cryptographic hash
checksum of the original package binary.  You quickly need to rely on
weak SHA1 identifiers, and I recall there has been multiple valid
versions of the same binary (due to security.debian.org rebuilds or
something like that).

> I would hope that we could agree that building an artifact composed partly
> or entirely from 100% DFSG binary packages that are themselves reproducible
> would produce a transitively reproducible build.

That depends on how you define "reproducible"...  I think most people
here doesn't consider it required to rebuild the transitive closure of
build dependencies to call something reproducible.  So in that case I
don't think your statement is necessarily true.  Also consider the case
with removed packages due to different DFSG interpretation (or
definition) that changed over the years.

> For closed-source binary firmware blobs, the situation does seem less
> clear.  They arguably can be used as fixed inputs to a build to achieve
> identical bit-for-bit output -- but if I understand correctly, it raises a
> question of "is complete source code to all inputs required in order to
> label/certify an artifact as reproducibly buildable?".

Indeed.

And if we don't have source code for object X, how can we tell that it
is a firmware blob?  There is no simple way to know what it is, and some
methods to establish what it is (disassembly) may be illegal.

> I'm not initially sure how/whether an exception clause could be written to
> allow binary inputs under some circumstances, without reducing the
> effectiveness of the definition (because, for example, copying an entirely
> opaque blob from one directory to another could be argued as within such a
> redefinition).

Agreed!  In my mind, there is a way out of that dilemma:

1) One term, e.g., "recreatable", to cover the situation where you don't
have source code for the transitive closure, and don't require
rebuilding of that source code, of the set of build dependencies.  This
leads to a degenerative build process of 'cp FOO BAR' to create a
"recreatable" artifact.  The Debian LiveCD would fall into this
category.

2) Another term, e.g., "reproducible" to cover the situation where ALL
source code for ALL build dependencies including their build
dependencies, and so on, are available and used to recreate the
bit-by-bit identical artifact.

The problem is that many people seem to use the term "reproducible" to
mean 1) today, so if we settle on these definitions, there will still be
ambiguity unless everyone adopts the new definitions.

There are at least two ways to reach 2) for an OS: bootstrappable builds
or idempotent builds.  Guix show a bootstrappable build is feasible, I
don't know anyone testing idempotent builds of any OS.

/Simon
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 1251 bytes
Desc: not available
URL: <http://lists.reproducible-builds.org/pipermail/rb-general/attachments/20250702/48fda82f/attachment.sig>