success-story: emacs

Bernhard M. Wiedemann bernhardout at lsmod.de
Sat Dec 21 02:49:16 UTC 2024


Greetings fellow RB-ings

This is the writeup of my (Bernhard's) long journey towards a 
bit-reproducible emacs package.

In the beginning, I had identified two parts that were unreproducible 
for different reasons.
1. .eln files
2. .pdmp files


Part 1: since the first appeared to be easier, I started an upstream 
discussion. [1]
It even turned out that someone else had discussed the same diffs 4 
months earlier [3] - and the diff format in those messages is that of 
openSUSE's build-compare :-)
Upstream created patches for the master branch, but I could not test 
those, because the latest release was way behind and backporting was too 
hard.

However, the workaround of setarch -R was easy enough to apply, so I 
moved on to the harder issue.


Part 2: [2] .pdmp files were created by "temacs" by loading a lot of 
LISP code and then dumping its state.

There were so many things I tried. The easiest was to use setarch -R 
taskset 1 to limit parallelism and address-randomness, but that was not 
enough.
I used strace to see differences in these runs + tried to patch out all 
observed calls to getrandom and sysinfo syscalls. And that also was not 
enough. The best (random) diffs still had 13 different bytes. I even 
explored to use dd to overwrite those remaining bytes with constants. It 
did not help as there was too much variation between runs.

The breakthrough came when I went back to a tool that had been presented 
at a previous r-b summit. It is called dettrace and uses seccomp to 
trigger ptrace events for relevant syscalls. I had packaged it for 
openSUSE earlier and added 'theunreproduciblepackage' as test-suite. 
However, it only worked on old openSUSE Leap 15.5, not on modern Linux 
with new glibc which used clone3 and other new syscalls.
Fabian, a coworker at SUSE, helped me to adapt it to new kernels [4] and 
I was finally able to get the first pair of bit-identical emacs packages.
But the journey was not yet quite done. I found that on another machine, 
it produced different binaries, because the CPU-type matters. And 
dettrace got stuck when running it with more that 1 CPU core available. 
 From the strace output, it seems the latter is because it calls into 
libImageMagick that used libomp that checks 
/sys/devices/system/cpu/possible to do multi-processing. Dettrace 
implements its own scheduler to ensure non-racy deterministic 
processing, but these forked processes wait on one another - causing an 
eternal wait.
I tried unsuccessfully to set MAGICK_THREAD_LIMIT=1 and to use taskset 1 .
I tried unsuccessfully to return -ENOENT for 
/sys/devices/system/cpu/possible via dettrace.
In the end, I patched our build tool [5] to call qemu with -smp 1 and 
-cpu qemu64 for the emacs package [6] - and that mechanism also helped 
two other packages with rare issues:
     - colord with its CPU-dependent .icc profiles
     - python-lxml that produced different binaries on VMs with more 
than 4 cores

Ciao
Bernhard M.


[1] https://mail.gnu.org/archive/html/emacs-devel/2024-01/msg00464.html 
https://mail.gnu.org/archive/html/emacs-devel/2024-02/msg00417.html
[2] https://mail.gnu.org/archive/html/emacs-devel/2024-10/msg00004.html
[3] https://mail.gnu.org/archive/html/emacs-devel/2023-10/msg00127.html
[4] https://github.com/bmwiedemann/dettrace/tree/suse
[5] https://github.com/openSUSE/obs-build/pull/1037
[6] 
https://build.opensuse.org/projects/home:bmwiedemann:reproducible:distribution:ring1/packages/emacs/files/_buildparams

-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 236 bytes
Desc: OpenPGP digital signature
URL: <http://lists.reproducible-builds.org/pipermail/rb-general/attachments/20241221/2532b9e9/attachment.sig>


More information about the rb-general mailing list