Arch Linux minimal container userland 100% reproducible - now what?
kpcyrd
kpcyrd at archlinux.org
Tue Mar 26 11:47:23 UTC 2024
On 3/22/24 18:52, John Gilmore wrote:
> Congratulations on closing in toward Arch Linux reproducibility!!!
Thanks! :)
> I have no experience with Arch -- am just reading what's on their
> website. From a quick glance at their docs, the Arch distribution
> *only* distributes binary packages. They only offer URLs for source
> code, requiring that users depend on a working Internet connection and
> what could be a large, arbitrary set of HTTPS servers that in theory
> contain the matching source code. See:
>
> https://wiki.archlinux.org/title/Arch_build_system
>
> (I'm not sure how that even meets the requirements of the GPL for
> binary distributors to make the matching source code available to
> recipients of the binaries.)
There's a mirror for source code at: https://sources.archlinux.org/
The build instructions are located at
https://gitlab.archlinux.org/archlinux/packaging/packages/, this is
linked as "Source Files" at e.g.
https://archlinux.org/packages/extra/x86_64/arti/.
There's currently no integration for sources.archlinux.org with Arch
Linux' devtools (that I'm aware of), so if an upstream decides they
don't want to renew their domain anymore the package would indeed not be
reproducible anymore. This happens occasionally, and there's also other
things that can lead to "a package used to be reproducible but now isn't
anymore".
With the model of reproducible builds, as currently implemented by the
Arch Linux community, each group tries to reproduce the binary package
from source repeatedly until they get an exact match. A group listing a
package as 'GOOD' is claiming they have managed (at some point) to
confirm the binary can be built when executing the given build
instructions on the given source code.
This means we sometimes also get away with embedded timestamps and
missing dependency lockfiles (Cargo.lock/package-lock.json/...), if all
groups built and confirmed the binary within a few hours of the
package's release. If somebody tries later on and notices the package is
not reproducible anymore this would of course be taken serious, but it's
still a pretty good baseline against a build server compromise or a
rogue package maintainer uploading tampered binaries[2] that nobody
would ever be able to reproduce.
[2]: https://github.com/kpcyrd/sh4d0wup
The practical solution to reproducible builds regressing would be
occasional retries to ensure the package can still be reproduced from
source code. We're currently not doing this yet because it'd cause a
non-trivial amount of co2, and up until recently there were too many
unreproducible packages anyway to do anything useful with these results.
Back in 2020, somebody from the first rebuilder groups has suggested to
allow delaying the first attempt[3] by e.g. 24h, but this feature hasn't
been a priority the last few years.
[3]: https://github.com/kpcyrd/rebuilderd/issues/27
> It seems to me that the next step in making the Arch release ISOs
> reproducible is to have the Arch release engineering team create a
> source-code release ISO that matches each binary release ISO. Then you
> (or anyone) could test the reproducibility of the release by having
> merely those two ISO images and a bare amd64 computer (without even an
> Internet connection). (Someone other than their releng team could do
> this shortly after the binary release, hoping that none of the URLs
> becomes inaccessible in the meantime. But the right time to gather the
> full source code for reproducibility is when they themselves pull in the
> source code to BUILD those binary packages that they will put in their
> release ISO.)
I think this falls under "bootstrappable builds", a bare amd64 computer
still needs something to boot into (a CD with only source code won't do
the trick).
Implementing this can get quite involved and as of 2024 is not a
personal priority for me (I'm just side-questing this along with a few
other people on top of our actual dayjobs), if anybody is interested in
working on this they are welcome to join #archlinux-reproducible on
libera, but I'm also not aware of any other distro having integrated
with https://bootstrappable.org/ yet.
> Making users reproduce an ISO full of binary packages by downloading the
> sources from all over the Internet seems highly prone to fail -- in the
> first few months, let alone five or ten years later.
For what I'm concerned the sources for Arch Linux should be considered
content-addressed - the PKGBUILD contains a list of sha256sums= (and/or
similiar) that references the source code by cryptographic checksum. The
source= array that points to 3rd party https servers is merely
provenance documentation (and coincidentally you can also download from
there).
Important note is also that build scripts are allowed to download from
the internet, you still need to read the source code of your actual
programs to make sure they do so in a safe and sound way. For example,
cargo downloads from crates.io (this is opaque to the Arch Linux build
system), but Cargo.lock pins the content with cryptographic checksums
that are expected to match. With these checksums you could also find the
corresponding source code at softwareheritage[4]:
[4]: https://docs.softwareheritage.org/user/software-origins/crates.html
> Even Arch's binary releases are only available from Arch for three
> (monthly) release cycles. Then you're on your own if you want to find a
> copy of what they released, like the one that was current last
> Christmas. See:
>
> https://archlinux.org/releng/releases/
>
> Arch may do great release engineering (I hope they do!), but it's
> apparently not *archival* release engineering.
The relevant resource is https://archive.archlinux.org/, it has both an
archive of old ISOs (which aren't that interesting imo), but also copies
of old packages. This kind of service is crucial for implementing
reproducible builds (because this is used to setup the build environment
described in BUILDINFO files), and reproducible-builds.org has recently
received $350k to implement an analogous service for Debian (to be able
to catch up with Arch Linux).
cheers,
kpcyrd
More information about the rb-general
mailing list