Introducing: Semantically reproducible builds

Vagrant Cascadian vagrant at
Mon May 29 20:19:52 UTC 2023

On 2023-05-29, Bernhard M. Wiedemann via rb-general wrote:
> On 29/05/2023 06.10, Vagrant Cascadian wrote:
>> Do such tools actually exist, or are we talking about something
>> theoretical here?
> is in use for 13 years.
> And strip-nondeterminism can be used to build another such tool.

Sure, I am well aware of strip-nondeterminism and somewhat aware of the
openSUSE build-compare. Debian uses strip-nondeterminism extensively, as
it is part of the majority of standard debhelper based packages ... so I
almost forget it is even there sometimes!

> They will only ever be able to normalize or ignore certain known classes 
> of differences. It is good enough to avoid review of many diffs.

Exactly. You can do this sort of thing, and it is useful in many cases,
but there are limits. It can only modify things in very specific
contexts. Each context is essentiall a feature and great care needs to
be made to make sure it does not break things or normalize too much.

Though strip-nondeterminism (and presumably similar tooling) will have
occasional bugs that break the resulting binaries, artifacts,

> e.g. has
> not-bit-by-bit-identical: 673
> build-compare-failed: 483
> So for 190 packages build-compare found that they only had insignificant 
> diffs and were considered semantically equivalent, so I could spend more 
> time, debugging the other 483 diffs.

These approaches are definitely useful to troubleshoot reproducibility
issues, by stripping out all the things that are deemed safe to
sanitize, normalize, etc. and leaving only the more inscrutible things
to scrutinize. I do not dispute that!

>> I very much worry that the meaning of Reproducible Builds may gradually
>> get whittled down
> I share this concern, which is why I have been calling this 
> semi-reproducible to distinguish it from bit-reproducible / 
> fully-reproducible.
> That 'semi-' prefix should give people a good hint of what it is and if 
> not, encourage them to ask for details. "sort-of-reproducible" or 
> "almost-but-not-quite-reproducible" could also be an option :-)

semi-reproducible still leaves me a bit nervous, but is definitely
clearer than semantic. :)

live well,
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 227 bytes
Desc: not available
URL: <>

More information about the rb-general mailing list