Arch Linux minimal container userland 100% reproducible - now what?
kpcyrd
kpcyrd at archlinux.org
Fri Mar 29 18:00:05 UTC 2024
On 3/29/24 6:48 AM, John Gilmore wrote:
> John Gilmore <gnu at toad.com> wrote:
> Bootstrappable builds are a different thing. Worthwhile, but not
> what I was asking for. I just wanted provable reproducibility from two
> ISO images and nothing more.
>
> I was asking that a bare amd64 be able to boot from an Arch Linux
> *binary* ISO image. And then be fed a matching Arch Linux *source* ISO
> image. And that the scripts in the source image would be able to
> reproduce the binary image from its source code, running the binaries
> (like the kernel, shell, and compiler) from the binary ISO image to do
> the rebuilds (without Internet access).
>
> This should be much simpler than doing a bootstrap from bare metal
> *without* a binary ISO image.
I think this project would still be somewhat involved:
1) There's currently no way to tell if a package can be built offline
(without trying yourself). Some distros have `options=(!net)`-like
settings, but pacman currently doesn't. Needing network access for
things like `cargo fetch` or `go mod download` is considered acceptable
in Arch Linux, since these extra inputs are pinned by cryptographic hash
(the PKGBUILD acts as a merkle-tree root).
Specifically Gentoo and OpenBSD Ports have solutions for this that I
really like, they store a generated list of URLs along with a
cryptographic checksum in a separate file, which includes crates
referenced in e.g. a project's Cargo.lock. When unpacking them to the
right location the build itself does not need any additional network
resources and can run fully offline.
This concept currently does not exist in pacman, one would potentially
need to generate 100+ lines into the source= array of a PKGBUILD (and
another 200+ lines for checksums if 2 checksum algorithms are used).
This is currently considered bad style, because the PKGBUILD is supposed
to be short, simple and easy to read/understand/audit.
2) The official ISO is meant for installation and maintenance, but does
not contain a compiler, and I'm not sure it should. Many of the other
base-devel packages are also missing, but since you also need the build
dependencies of all the packages you're using (recursively?) this should
likely be its own ISO (at which point you could also include the source
code however).
3) All of this doesn't take BUILDINFO files into account, you can use
Arch Linux as a source-based distro, but if you want exact matches with
the official packages you would need to match the compiler version that
was used for each respective package.
I did some digging and downloaded the buildinfo files for each package
that is present in the archlinux-2024.03.01 iso (using the
archlinux-userland-fs-cmp tool) and in total these gcc versions have
been used (gcc7 being part of the usb_modeswitch build environment, but
I didn't bother investigating why):
gcc7-7.4.1+20181207-3-x86_64
gcc-9.2.0-4-x86_64
gcc-9.3.0-1-x86_64
gcc-10.1.0-1-x86_64
gcc-10.1.0-2-x86_64
gcc-10.2.0-3-x86_64
gcc-10.2.0-4-x86_64
gcc-10.2.0-6-x86_64
gcc-11.1.0-1-x86_64
gcc-11.2.0-4-x86_64
gcc-12.1.0-2-x86_64
gcc-12.2.0-1-x86_64
gcc-12.2.1-1-x86_64
gcc-12.2.1-2-x86_64
gcc-12.2.1-4-x86_64
gcc-13.1.1-1-x86_64
gcc-13.1.1-2-x86_64
gcc-13.2.1-3-x86_64
gcc-13.2.1-4-x86_64
gcc-13.2.1-5-x86_64
And these versions of the Rust compiler:
rust-1:1.74.0-1-x86_64
rust-1:1.75.0-2-x86_64
rust-1:1.76.0-1-x86_64
In total the build environment of all packages consists of 3704
different (pkgname, pkgver) tuples.
If you disregard this, the packages you build with such an ISO wouldn't
match the official packages, but 2 groups with the same ISO could likely
produce matching binary packages (assuming they have a way to derive a
deterministic SOURCE_DATE_EPOCH value from that ISO).
From there on you'd "only" need to bootstrap a path to these binary
seeds, but that's also why I pointed out this is more relevant to
bootstrappable builds. Using plenty of different gcc versions looks
annoying, but is only an issue for bootstrapping, not for reproducible
builds (as long as everything is fully documented).
> If someday an Electromagnetic Pulse weapon destroys all the running
> computers, we'd like to bootstrap the whole industry up again, without
> breadboarding 8-bit micros and manually toggling in programs. Instead,
> a chip foundry can take these two ISOs and a bare laptop out of a locked
> fire-safe, reboot the (Arch Linux) world from them, and then use that
> Linux machine to control the chip-making and chip-testing machines that
> can make more high-function chips. (This would depend on the
> chip-makers keeping good offline fireproof backups of their own
> application software -- but even if they had that, they can't reboot and
> maintain the chip foundry without working source code for their
> controller's OS.)
I'm personally not interested in this scenario, I'm aware Allan McRae is
looking for funding for pacman development. Maybe somebody could sponsor
development of a "build without network" feature in pacman, or support
for auto-generated additional sources, like Gentoo or OpenBSD Ports,
mentioned above.
http://allanmcrae.com/about/
cheers,
kpcyrd
More information about the rb-general
mailing list