Reproducible Arch Linux in 2024/Q1 (irregular status update)
kpcyrd
kpcyrd at archlinux.org
Tue Mar 12 23:04:24 UTC 2024
hello,
since there's currently a lengthy discussion about the relevance of
build paths in reproducible builds, I took some time to do a status
update on the implementation of reproducible builds in Arch Linux.
There has been a system in place since late 2017 (run by
reproducible-builds.org) that builds Arch Linux packages twice and
compares the resulting binary packages. Since early 2020 there's also a
system that tries to 100%-match the official packages (as distributed
to and installed by users) using buildinfo files:
https://reproducible.archlinux.org
The last few years have been fairly event-less (aka things are going
mostly fine):
- In February 2022 there has been a regression, causing the pacman
packaging tools to not be reproducible anymore in many cases
- In August 2023 I submitted a fix that was accepted by pacman upstream
- The fix was released in Arch Linux February 2024, with pacman 6.0.2-9
This fix I also mentioned last time I wrote about reproducible Arch
Linux in August 2023:
https://lists.reproducible-builds.org/pipermail/rb-general/2023-August/003059.html
The email back then mentions 86% reproducible, while Arch Linux is now
overall 88.8% reproducible (hopefully reaching 89% soon).
There's also other groups rebuilding Arch Linux, in total this is the
list of known instances, each having their own build servers:
- https://reproducible.archlinux.org/
- https://reproducible.crypto-lab.ch/
- https://wolfpit.net/rebuild/
- https://rebuilder.pitastrudl.me/
- https://r-b.engineering.nyu.edu/
As of today, it's not yet possible to install a fully reproducible Arch
Linux system, however out of the packages used in
docker.io/library/archlinux there's only one unreproducible package left
(according to the instance at reproducible.archlinux.org):
- libcap 2.69-3
Which was built with pacman 6.0.2-8 (prior to the fix).
To make the connection back to the original topic (how much do
buildpaths matter): There's still a few other problems left Arch Linux
struggles with (that can't be solved with a normalized build path).
Besides the issue mentioned above, this is the current list of top root
causes of Linux software not being reproducible in praxis (no particular
order):
1) Build outputs of ghc, the haskell compiler, are currently not
deterministic with concurrency enabled. This bug has a lot of impact on
the total-percentage number of Arch Linux, but it's possible to install
and use Arch Linux without having any haskell packages installed.
https://gitlab.haskell.org/ghc/ghc/-/issues/12935
2) Build outputs of cgo (which Arch Linux uses for most go packages)
often have a mismatching `GO BUILDID`.
3) Timestamps embedded in .jar files (unreproducible zip files are a big
thing for some reason).
4) Missing dependency lockfiles (Cargo.lock, yarn.lock, ...). Some
distros like Debian do not make use of these files, but in Arch Linux
they are used to declare a resolved dependency tree to another
ecosystem, like crates.io or npm. If this file is missing, the
dependency tree might get resolved at build time (which is not
guaranteed to match the versions you'd get when resolving dependencies
in a week or three).
5) Binaries with the build time embedded in them (as part of it's
version output or user-agent strings).
6) Binaries with the hostname of the build server embedded in them.
7) Binaries with the Linux kernel version of the build server embedded
in them.
8) Ordering issues, e.g. a list of strings being embedded in a different
order each build.
9) Documentation with the build time embedded in it.
Most unreproducible packages fall into on of those buckets. The only
build path related problem in Arch Linux, are randomized filenames or
directory names that sometimes get embedded into the binary.
Anyway, cheers
kpcyrd
More information about the rb-general
mailing list