success-story: emacs
Bernhard M. Wiedemann
bernhardout at lsmod.de
Sat Dec 21 02:49:16 UTC 2024
Greetings fellow RB-ings
This is the writeup of my (Bernhard's) long journey towards a
bit-reproducible emacs package.
In the beginning, I had identified two parts that were unreproducible
for different reasons.
1. .eln files
2. .pdmp files
Part 1: since the first appeared to be easier, I started an upstream
discussion. [1]
It even turned out that someone else had discussed the same diffs 4
months earlier [3] - and the diff format in those messages is that of
openSUSE's build-compare :-)
Upstream created patches for the master branch, but I could not test
those, because the latest release was way behind and backporting was too
hard.
However, the workaround of setarch -R was easy enough to apply, so I
moved on to the harder issue.
Part 2: [2] .pdmp files were created by "temacs" by loading a lot of
LISP code and then dumping its state.
There were so many things I tried. The easiest was to use setarch -R
taskset 1 to limit parallelism and address-randomness, but that was not
enough.
I used strace to see differences in these runs + tried to patch out all
observed calls to getrandom and sysinfo syscalls. And that also was not
enough. The best (random) diffs still had 13 different bytes. I even
explored to use dd to overwrite those remaining bytes with constants. It
did not help as there was too much variation between runs.
The breakthrough came when I went back to a tool that had been presented
at a previous r-b summit. It is called dettrace and uses seccomp to
trigger ptrace events for relevant syscalls. I had packaged it for
openSUSE earlier and added 'theunreproduciblepackage' as test-suite.
However, it only worked on old openSUSE Leap 15.5, not on modern Linux
with new glibc which used clone3 and other new syscalls.
Fabian, a coworker at SUSE, helped me to adapt it to new kernels [4] and
I was finally able to get the first pair of bit-identical emacs packages.
But the journey was not yet quite done. I found that on another machine,
it produced different binaries, because the CPU-type matters. And
dettrace got stuck when running it with more that 1 CPU core available.
From the strace output, it seems the latter is because it calls into
libImageMagick that used libomp that checks
/sys/devices/system/cpu/possible to do multi-processing. Dettrace
implements its own scheduler to ensure non-racy deterministic
processing, but these forked processes wait on one another - causing an
eternal wait.
I tried unsuccessfully to set MAGICK_THREAD_LIMIT=1 and to use taskset 1 .
I tried unsuccessfully to return -ENOENT for
/sys/devices/system/cpu/possible via dettrace.
In the end, I patched our build tool [5] to call qemu with -smp 1 and
-cpu qemu64 for the emacs package [6] - and that mechanism also helped
two other packages with rare issues:
- colord with its CPU-dependent .icc profiles
- python-lxml that produced different binaries on VMs with more
than 4 cores
Ciao
Bernhard M.
[1] https://mail.gnu.org/archive/html/emacs-devel/2024-01/msg00464.html
https://mail.gnu.org/archive/html/emacs-devel/2024-02/msg00417.html
[2] https://mail.gnu.org/archive/html/emacs-devel/2024-10/msg00004.html
[3] https://mail.gnu.org/archive/html/emacs-devel/2023-10/msg00127.html
[4] https://github.com/bmwiedemann/dettrace/tree/suse
[5] https://github.com/openSUSE/obs-build/pull/1037
[6]
https://build.opensuse.org/projects/home:bmwiedemann:reproducible:distribution:ring1/packages/emacs/files/_buildparams
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 236 bytes
Desc: OpenPGP digital signature
URL: <http://lists.reproducible-builds.org/pipermail/rb-general/attachments/20241221/2532b9e9/attachment.sig>
More information about the rb-general
mailing list