mitigating non-determinism

Bernhard M. Wiedemann bernhardout at lsmod.de
Tue Jun 18 10:57:33 UTC 2024


Dear fellow rb-lers

In https://github.com/bmwiedemann/theunreproduciblepackage/ I had 
collected the many issues that introduce non-determinism.

Today it is time to talk about mitigations - how can we avoid whole 
classes of problems that would prevent verification of the 
source->binary relation, without patching an infinite number of 
individual packages.

env:
One part is to normalize as much about the build environment as possible.
E.g. have your build scripts clear all env and set environment variables 
for locales (LANG / LC_ALL=C) and TZ=UTC is easy enough. umask as well.
If you build in a chroot/container/VM, pathes and username can also be 
constant, e.g. /home/abuild for openSUSE.

ASLR:
Influences from address-space-layout-randomization(ASLR) can be avoided 
with setarch -R COMMAND or globally with echo 0 > 
/proc/sys/kernel/randomize_va_space . This also helps with some cases of 
uninitialized memory.

uname -a / kernel-version / hostname:
Then there are packages such as perl that embed the kernel version from 
the build machine. Building in kvm helps there.
`hostname` can be normalized the same way.

CPU:
Some packages such as calibre vary from CPU-type, because Qt6's 
QImage.scaled method optimizes differently, depending on availability of 
certain CPU-instructions. KVM helps here again with its -cpu parameter 
to have the same virtual CPU on different physical hosts - even on 
completely different architectures at a 10x performance penalty for 
emulation.

readdir:
Many packages get non-determinism from filesystem-readdir-order. In 
official builds of openSUSE we avoid that with the prjconf line 
'BuildFlags: vmfsoptions:nodirindex' for ext4 that maps to a `mkfs.ext4 
-O ^dir_index` . It is also possible to override the random seed with a 
constant instead. ext2/3 don't need even that.
And if you want to be more independent from the underlying filestem, 
there is `disorderfs --sort-dirents=yes`

parallelism/race:
Influences from parallelism / race-conditions can often (but not always) 
be avoided with a 1-core-VM or with `taskset 1 COMMAND`. dettrace[1] can 
also help avoid this.
Note that some packages fail to build with this mitigation as their 
testsuites expect a certain amount of CPU cores / threads to execute in 
parallel.

date:
Some influences from date can be avoided by using SOURCE_DATE_EPOCH to 
always start builds at the same time with qemu-kvm -rtc 
base=2024-06-16T00:00:00
There is also libfaketime and dettrace but they can be unreliable or 
have undesirable side-effects that break a build.

random:
Some randomness can be avoided by replacing /dev/random and /dev/urandom 
with /dev/zero. But it also breaks some builds, especially rust builds. 
dettrace replaces more randomness with a deterministic PRNG sequence. 
Care must be taken here with cases where cryptographic secrets are 
created at build-time (e.g. in libcamera and tigervnc) because with 
determinisic random numbers it becomes possible for others to recreate 
the same ephemeral/temporary private keys and sign malicious content.

hash:
Many hash implementations use a randomized seed to defend against DoS ( 
http://ocert.org/advisories/ocert-2011-003.html ). For some, the seed 
can be made constant with
export QT_HASH_SEED=0
export PERL_HASH_SEED=42
export PYTHONHASHSEED=0
This is considered safe, because a) build inputs are trusted and b) a 
DoS there would only slow down a build process.

PID:
For the rare case where a process-ID (PID) number gets embedded (e.g. 
tar), we can `echo 1234 > /proc/sys/kernel/ns_last_pid` to influence the 
next PID number used.


---
With all mitigations applied, there are not many sources of randomness 
left. There is remaining (rare) compile-time benchmarking, some 
uninitialized memory, mtimes and other timestamps that include seconds 
or minutes and some remaining raciness.

How to continue from here? I'd like to see some of this added to docs in 
a structured fashion under https://reproducible-builds.org/docs/ - any 
volunteer?

If you have ideas for improvement, you can also contribute in
https://etherpad.opensuse.org/p/reproduciblebuilds-mitigations


Ciao
Bernhard M.

[1] https://build.opensuse.org/package/show/devel:tools/dettrace


More information about the rb-general mailing list