Reproducible Arch Linux in 2024/Q1 (irregular status update)

kpcyrd kpcyrd at archlinux.org
Tue Mar 12 23:04:24 UTC 2024


hello,

since there's currently a lengthy discussion about the relevance of 
build paths in reproducible builds, I took some time to do a status 
update on the implementation of reproducible builds in Arch Linux.

There has been a system in place since late 2017 (run by 
reproducible-builds.org) that builds Arch Linux packages twice and 
compares the resulting binary packages. Since early 2020 there's also a 
system that  tries to 100%-match the official packages (as distributed 
to and installed by users) using buildinfo files:

https://reproducible.archlinux.org

The last few years have been fairly event-less (aka things are going 
mostly fine):

- In February 2022 there has been a regression, causing the pacman 
packaging tools to not be reproducible anymore in many cases
- In August 2023 I submitted a fix that was accepted by pacman upstream
- The fix was released in Arch Linux February 2024, with pacman 6.0.2-9

This fix I also mentioned last time I wrote about reproducible Arch 
Linux in August 2023:

https://lists.reproducible-builds.org/pipermail/rb-general/2023-August/003059.html

The email back then mentions 86% reproducible, while Arch Linux is now 
overall 88.8% reproducible (hopefully reaching 89% soon).

There's also other groups rebuilding Arch Linux, in total this is the 
list of known instances, each having their own build servers:

- https://reproducible.archlinux.org/
- https://reproducible.crypto-lab.ch/
- https://wolfpit.net/rebuild/
- https://rebuilder.pitastrudl.me/
- https://r-b.engineering.nyu.edu/

As of today, it's not yet possible to install a fully reproducible Arch 
Linux system, however out of the packages used in 
docker.io/library/archlinux there's only one unreproducible package left 
(according to the instance at reproducible.archlinux.org):

- libcap 2.69-3

Which was built with pacman 6.0.2-8 (prior to the fix).

To make the connection back to the original topic (how much do 
buildpaths matter): There's still a few other problems left Arch Linux 
struggles with (that can't be solved with a normalized build path). 
Besides the issue mentioned above, this is the current list of top root 
causes of Linux software not being reproducible in praxis (no particular 
order):

1) Build outputs of ghc, the haskell compiler, are currently not 
deterministic with concurrency enabled. This bug has a lot of impact on 
the total-percentage number of Arch Linux, but it's possible to install 
and use Arch Linux without having any haskell packages installed. 
https://gitlab.haskell.org/ghc/ghc/-/issues/12935

2) Build outputs of cgo (which Arch Linux uses for most go packages) 
often have a mismatching `GO BUILDID`.

3) Timestamps embedded in .jar files (unreproducible zip files are a big 
thing for some reason).

4) Missing dependency lockfiles (Cargo.lock, yarn.lock, ...). Some 
distros like Debian do not make use of these files, but in Arch Linux 
they are used to declare a resolved dependency tree to another 
ecosystem, like crates.io or npm. If this file is missing, the 
dependency tree might get resolved at build time (which is not 
guaranteed to match the versions you'd get when resolving dependencies 
in a week or three).

5) Binaries with the build time embedded in them (as part of it's 
version output or user-agent strings).

6) Binaries with the hostname of the build server embedded in them.

7) Binaries with the Linux kernel version of the build server embedded 
in them.

8) Ordering issues, e.g. a list of strings being embedded in a different 
order each build.

9) Documentation with the build time embedded in it.

Most unreproducible packages fall into on of those buckets. The only 
build path related problem in Arch Linux, are randomized filenames or 
directory names that sometimes get embedded into the binary.

Anyway, cheers
kpcyrd


More information about the rb-general mailing list