Arch Linux minimal container userland 100% reproducible - now what?

kpcyrd kpcyrd at archlinux.org
Tue Mar 26 11:47:23 UTC 2024


On 3/22/24 18:52, John Gilmore wrote:
> Congratulations on closing in toward Arch Linux reproducibility!!!

Thanks! :)

> I have no experience with Arch -- am just reading what's on their
> website.  From a quick glance at their docs, the Arch distribution
> *only* distributes binary packages.  They only offer URLs for source
> code, requiring that users depend on a working Internet connection and
> what could be a large, arbitrary set of HTTPS servers that in theory
> contain the matching source code.  See:
> 
>    https://wiki.archlinux.org/title/Arch_build_system
> 
> (I'm not sure how that even meets the requirements of the GPL for
> binary distributors to make the matching source code available to
> recipients of the binaries.)

There's a mirror for source code at: https://sources.archlinux.org/

The build instructions are located at 
https://gitlab.archlinux.org/archlinux/packaging/packages/, this is 
linked as "Source Files" at e.g. 
https://archlinux.org/packages/extra/x86_64/arti/.

There's currently no integration for sources.archlinux.org with Arch 
Linux' devtools (that I'm aware of), so if an upstream decides they 
don't want to renew their domain anymore the package would indeed not be 
reproducible anymore. This happens occasionally, and there's also other 
things that can lead to "a package used to be reproducible but now isn't 
anymore".

With the model of reproducible builds, as currently implemented by the 
Arch Linux community, each group tries to reproduce the binary package 
from source repeatedly until they get an exact match. A group listing a 
package as 'GOOD' is claiming they have managed (at some point) to 
confirm the binary can be built when executing the given build 
instructions on the given source code.

This means we sometimes also get away with embedded timestamps and 
missing dependency lockfiles (Cargo.lock/package-lock.json/...), if all 
groups built and confirmed the binary within a few hours of the 
package's release. If somebody tries later on and notices the package is 
not reproducible anymore this would of course be taken serious, but it's 
still a pretty good baseline against a build server compromise or a 
rogue package maintainer uploading tampered binaries[2] that nobody 
would ever be able to reproduce.

[2]: https://github.com/kpcyrd/sh4d0wup

The practical solution to reproducible builds regressing would be 
occasional retries to ensure the package can still be reproduced from 
source code. We're currently not doing this yet because it'd cause a 
non-trivial amount of co2, and up until recently there were too many 
unreproducible packages anyway to do anything useful with these results. 
Back in 2020, somebody from the first rebuilder groups has suggested to 
allow delaying the first attempt[3] by e.g. 24h, but this feature hasn't 
been a priority the last few years.

[3]: https://github.com/kpcyrd/rebuilderd/issues/27

> It seems to me that the next step in making the Arch release ISOs
> reproducible is to have the Arch release engineering team create a
> source-code release ISO that matches each binary release ISO.  Then you
> (or anyone) could test the reproducibility of the release by having
> merely those two ISO images and a bare amd64 computer (without even an
> Internet connection).  (Someone other than their releng team could do
> this shortly after the binary release, hoping that none of the URLs
> becomes inaccessible in the meantime.  But the right time to gather the
> full source code for reproducibility is when they themselves pull in the
> source code to BUILD those binary packages that they will put in their
> release ISO.)

I think this falls under "bootstrappable builds", a bare amd64 computer 
still needs something to boot into (a CD with only source code won't do 
the trick).

Implementing this can get quite involved and as of 2024 is not a 
personal priority for me (I'm just side-questing this along with a few 
other people on top of our actual dayjobs), if anybody is interested in 
working on this they are welcome to join #archlinux-reproducible on 
libera, but I'm also not aware of any other distro having integrated 
with https://bootstrappable.org/ yet.

> Making users reproduce an ISO full of binary packages by downloading the
> sources from all over the Internet seems highly prone to fail -- in the
> first few months, let alone five or ten years later.

For what I'm concerned the sources for Arch Linux should be considered 
content-addressed - the PKGBUILD contains a list of sha256sums= (and/or 
similiar) that references the source code by cryptographic checksum. The 
source= array that points to 3rd party https servers is merely 
provenance documentation (and coincidentally you can also download from 
there).

Important note is also that build scripts are allowed to download from 
the internet, you still need to read the source code of your actual 
programs to make sure they do so in a safe and sound way. For example, 
cargo downloads from crates.io (this is opaque to the Arch Linux build 
system), but Cargo.lock pins the content with cryptographic checksums 
that are expected to match. With these checksums you could also find the 
corresponding source code at softwareheritage[4]:

[4]: https://docs.softwareheritage.org/user/software-origins/crates.html

> Even Arch's binary releases are only available from Arch for three
> (monthly) release cycles.  Then you're on your own if you want to find a
> copy of what they released, like the one that was current last
> Christmas.  See:
> 
>    https://archlinux.org/releng/releases/
> 
> Arch may do great release engineering (I hope they do!), but it's
> apparently not *archival* release engineering.

The relevant resource is https://archive.archlinux.org/, it has both an 
archive of old ISOs (which aren't that interesting imo), but also copies 
of old packages. This kind of service is crucial for implementing 
reproducible builds (because this is used to setup the build environment 
described in BUILDINFO files), and reproducible-builds.org has recently 
received $350k to implement an analogous service for Debian (to be able 
to catch up with Arch Linux).

cheers,
kpcyrd


More information about the rb-general mailing list