Introducing: Semantically reproducible builds

David A. Wheeler dwheeler at
Mon May 29 03:25:49 UTC 2023

On Sun, 28 May 2023 08:02:18 +0200, "Bernhard M. Wiedemann via rb-general"
 <rb-general at> wrote:

> I agree, that it is good to give it a name (I have called it 
> semi-reproducible before), but we should be clear on communicating the 
> disadvantages.


> However, while working with the tool, I already found three (3) bugs in 
> build-compare that made it report packages with significant differences 
> as 'identical'.

Obviously that's bad. However, my current alternative is
"hoping for the best when downloading from PyPI" and I'm not
a fan of that process either.

If you have tips on common likely errors, please post, I think
that would be of interest to many.

> And if you don't rely on such tools, you need expensive manual reviews 
> every time that cannot be automated and might also miss issues.

Impractical. In 2019 it was reported that a new application
created using "create-react-app" version 2.1.5 had 1,568 dependencies.
React is really popular in the JavaScript ecosystem.
Yes, this is "create-react-app" not React itself, and there are many
caveats, but the *reality* is that often users are trying to deal with
thousands of dependencies & we need *some* way to flag the
most concerning ones.

> Another disadvantage of such binaries is that you don't have a single 
> correct SHAsum that can be signed, communicated and compared easily.
> You always need the full binary to compare to your rebuild.

Agreed, that's a problem. To be fair, usually there's a
"canonical binary package" that people are using & for which people
can use a hash. In fact, many package managers can specify a hash.
The problem is that there are few ways for people to gain confidence that
this package *is* generated from the putative source code.

> The cleaner way is to use strip-nondeterminism to remove all these 
> insignificant bits during build and make the resulting bit-reproducible 
> output the official binary.

For a Linux *distributor* this makes sense. If you have control over the
build process, a more rigorous build process is great, and hardening that
build process against attacks is a wonderful idea (e.g., OpenSSF's SLSA).

As a *recipient* who has no control over the build process used by
someone else to create their package, I need some workable
alternatives to estimate risk.


--- David A. Wheeler

More information about the rb-general mailing list