[rb-general] auto-analyzing indeterminism

Bernhard M. Wiedemann bernhardout at lsmod.de
Thu Sep 7 22:24:03 CEST 2017

Hash: SHA1

On 2017-07-17 03:27, Bernhard M. Wiedemann wrote:
> during the r-b summit in Berlin I heard of an idea to
> automatically 'bisect' sources of indeterminism to make it easier
> to fix software.
> Yesterday (when I could not sleep), I did a quick proof of that 
> concept in 60 lines of code in 
> https://github.com/bmwiedemann/reproducibleopensuse/blob/devel/autoclassify
>  This will get further refined, but already seems useful enough so
> that I'm currently running it with the ~500 smaller unreproducible
> packages in openSUSE.

since the topic came up in today's IRC meeting, here is a quick update:
I now ran it on all 622 unreproducible openSUSE packages and already
submitted many easy fixes upstream.
Unfortunately there is a large portion of packages that do not become
reproducible with all determinism-tweaks applied.
fontforge, octave, xemacs, mono and latex come to mind.
I guess, there exist patches for some.

includes the detected bits in the "opensuseautoclassify" value string
to give it some up-to-date visibility without me writing emails about it.

> The basic idea is that there is only a limited number of sources
> of indeterminism. In openSUSE we already have a rather normalized
> build environment with constant user, path, locale, timezone
> And now it is possible to not vary some of the others. Which of
> them gets more or less indeterminism is a small number of bits in
> my script: 1. date 2. hostname 3. filesystem readdir order (using
> disorderfs sort mode) 4. date+time (when called via 'date'
> command) 5. apply strip-nondeterminism to all files after build 6.
> use some experimental r-b-patched package versions
added 3 more bits:
7. use reproducible-faketools-tar (includes gzip -n)
8. build with 1 core / make -j1
9. disable ASLR in build system

and maybe soon a 10. to use reproducible-faketools-pid
 (=build always starts with PID 15000 - in case someone uses getpid,
$$ or equivalent)

interestingly #9 found

and #8 helped with ghc (Haskell) packages

There are also some few packages that have roughly 1 bit of entropy in
their build and thus are rather hard to autoclassify, because they
sometimes randomly generate the same build result twice.
One could counter that by doing more than 2 builds to determine if
something is reproducible, but that is not yet implemented.

Bernhard M.


More information about the rb-general mailing list