Introducing: Semantically reproducible builds

David A. Wheeler dwheeler at
Mon May 29 02:56:01 UTC 2023

On Sun, 28 May 2023 13:04:40 +0100, James Addison via rb-general <rb-general at> wrote:

> Hi David,
> Thanks for sharing this.
> I think that the problem with this idea and name are:
> - That it does not allow two or more people to share and confirm that
> they have the same build of some software.

Sure they can, they just use the same process (e.g., use the same tool to
verify it). E.g., if you rebuild it, and the two builds are the same EXCEPT
for the datetime stamps, it's semantically reproducible (not fully reproducible).

> - That it does not allow tests to fail-early, catching and preventing
> reproducibility  regressions (semantic or otherwise).

It's *possible* to fail early, though the CPU requirements are admittedly
higher (because you have to do much more than a bit-for-bit test).

But I expect that in practice, the use of "semantically reproducible builds"
is long *after* any CI/CD process of the package being analyzed.
The problem, in many cases, is that the package was not created in a way
that supports reproducible builds, so the goal is to try to estimate the
risk of the package when it is *not* a reproducible build.

> - That the naming terminology conflates with true reproducible builds,
> therefore creating the potential for misunderstanding to consumers.

Naming is hard. As long as the term is carefully defined I think it works.
You can use "fully reproducible build" when you want to contrast, and
that makes it clear that a normal "reproducible build" is the stronger test
(at the cost of being sometimes harder to achieve).

--- David A. Wheeler

More information about the rb-general mailing list