Arch Linux minimal container userland 100% reproducible - now what?

kpcyrd kpcyrd at archlinux.org
Fri Mar 29 18:00:05 UTC 2024


On 3/29/24 6:48 AM, John Gilmore wrote:
> John Gilmore <gnu at toad.com> wrote:
> Bootstrappable builds are a different thing.  Worthwhile, but not
> what I was asking for.  I just wanted provable reproducibility from two
> ISO images and nothing more.
> 
> I was asking that a bare amd64 be able to boot from an Arch Linux
> *binary* ISO image.  And then be fed a matching Arch Linux *source* ISO
> image.  And that the scripts in the source image would be able to
> reproduce the binary image from its source code, running the binaries
> (like the kernel, shell, and compiler) from the binary ISO image to do
> the rebuilds (without Internet access).
> 
> This should be much simpler than doing a bootstrap from bare metal
> *without* a binary ISO image.

I think this project would still be somewhat involved:

1) There's currently no way to tell if a package can be built offline 
(without trying yourself). Some distros have `options=(!net)`-like 
settings, but pacman currently doesn't. Needing network access for 
things like `cargo fetch` or `go mod download` is considered acceptable 
in Arch Linux, since these extra inputs are pinned by cryptographic hash 
(the PKGBUILD acts as a merkle-tree root).

Specifically Gentoo and OpenBSD Ports have solutions for this that I 
really like, they store a generated list of URLs along with a 
cryptographic checksum in a separate file, which includes crates 
referenced in e.g. a project's Cargo.lock. When unpacking them to the 
right location the build itself does not need any additional network 
resources and can run fully offline.

This concept currently does not exist in pacman, one would potentially 
need to generate 100+ lines into the source= array of a PKGBUILD (and 
another 200+ lines for checksums if 2 checksum algorithms are used). 
This is currently considered bad style, because the PKGBUILD is supposed 
to be short, simple and easy to read/understand/audit.

2) The official ISO is meant for installation and maintenance, but does 
not contain a compiler, and I'm not sure it should. Many of the other 
base-devel packages are also missing, but since you also need the build 
dependencies of all the packages you're using (recursively?) this should 
likely be its own ISO (at which point you could also include the source 
code however).

3) All of this doesn't take BUILDINFO files into account, you can use 
Arch Linux as a source-based distro, but if you want exact matches with 
the official packages you would need to match the compiler version that 
was used for each respective package.

I did some digging and downloaded the buildinfo files for each package 
that is present in the archlinux-2024.03.01 iso (using the 
archlinux-userland-fs-cmp tool) and in total these gcc versions have 
been used (gcc7 being part of the usb_modeswitch build environment, but 
I didn't bother investigating why):

gcc7-7.4.1+20181207-3-x86_64
gcc-9.2.0-4-x86_64
gcc-9.3.0-1-x86_64
gcc-10.1.0-1-x86_64
gcc-10.1.0-2-x86_64
gcc-10.2.0-3-x86_64
gcc-10.2.0-4-x86_64
gcc-10.2.0-6-x86_64
gcc-11.1.0-1-x86_64
gcc-11.2.0-4-x86_64
gcc-12.1.0-2-x86_64
gcc-12.2.0-1-x86_64
gcc-12.2.1-1-x86_64
gcc-12.2.1-2-x86_64
gcc-12.2.1-4-x86_64
gcc-13.1.1-1-x86_64
gcc-13.1.1-2-x86_64
gcc-13.2.1-3-x86_64
gcc-13.2.1-4-x86_64
gcc-13.2.1-5-x86_64

And these versions of the Rust compiler:

rust-1:1.74.0-1-x86_64
rust-1:1.75.0-2-x86_64
rust-1:1.76.0-1-x86_64

In total the build environment of all packages consists of 3704 
different (pkgname, pkgver) tuples.

If you disregard this, the packages you build with such an ISO wouldn't 
match the official packages, but 2 groups with the same ISO could likely 
produce matching binary packages (assuming they have a way to derive a 
deterministic SOURCE_DATE_EPOCH value from that ISO).

 From there on you'd "only" need to bootstrap a path to these binary 
seeds, but that's also why I pointed out this is more relevant to 
bootstrappable builds. Using plenty of different gcc versions looks 
annoying, but is only an issue for bootstrapping, not for reproducible 
builds (as long as everything is fully documented).

> If someday an Electromagnetic Pulse weapon destroys all the running
> computers, we'd like to bootstrap the whole industry up again, without
> breadboarding 8-bit micros and manually toggling in programs.  Instead,
> a chip foundry can take these two ISOs and a bare laptop out of a locked
> fire-safe, reboot the (Arch Linux) world from them, and then use that
> Linux machine to control the chip-making and chip-testing machines that
> can make more high-function chips.  (This would depend on the
> chip-makers keeping good offline fireproof backups of their own
> application software -- but even if they had that, they can't reboot and
> maintain the chip foundry without working source code for their
> controller's OS.)

I'm personally not interested in this scenario, I'm aware Allan McRae is 
looking for funding for pacman development. Maybe somebody could sponsor 
development of a "build without network" feature in pacman, or support 
for auto-generated additional sources, like Gentoo or OpenBSD Ports, 
mentioned above.

http://allanmcrae.com/about/

cheers,
kpcyrd


More information about the rb-general mailing list