Arch Linux minimal container userland 100% reproducible - now what?

John Gilmore gnu at toad.com
Fri Mar 29 19:29:42 UTC 2024


kpcyrd <kpcyrd at archlinux.org> wrote:
> 1) There's currently no way to tell if a package can be built offline 
> (without trying yourself).

Packages that can't be built offline are not reproducible, by
definition.  They depend on outside events and circumstances
in order for a third party to reproduce them successfully.

So, fixing that in each package would be a prerequisite to making a
reproducible Arch distro (in my opinion).

I don't understand why a "source tree" would store a checksum of a
source tarball or source file, rather than storing the actual source
tarball or source file.  You can't compile a checksum.

kpcyrd <kpcyrd at archlinux.org> wrote:
> Specifically Gentoo and OpenBSD Ports have solutions for this that I 
> really like, they store a generated list of URLs along with a 
> cryptographic checksum in a separate file, which includes crates 
> referenced in e.g. a project's Cargo.lock.

I don't know what a crate or a Cargo.lock is, but rather than fix the
problem at its source (include the source files), you propose to add
another complex circumvention alongside the existing package building
infrastructure?  What is the advantage of that over merely doing the
"cargo fetch" early rather than late and putting all the resulting
source files into the Arch source package?

> 3) All of this doesn't take BUILDINFO files into account

The BUILDINFO files are part of the source distribution needed
to reproduce the binary distribution.  So they would go on the
source ISO image.

> I did some digging and downloaded the buildinfo files for each package 
> that is present in the archlinux-2024.03.01 iso

Thank you for doing that digging!

>                           Using plenty of different gcc versions looks 
> annoying, but is only an issue for bootstrapping, not for reproducible 
> builds (as long as everything is fully documented).

I agree that it's annoying.  It compounds the complexity of reproducing
the build.  Does Arch get some benefit from doing so?

Ideally, a binary release ISO would be built with a single set of
compiler tools.  Why is Arch using a dozen compiler versions?  Just to
avoid rebuilding binary packages once the binary release's engineers
decide what compiler is going to be this release's gold-standard
compiler?  (E.g. The one that gets installed when the user runs pacman
to install gcc.)  Or do the release-engineers never actually standardize
on a compiler -- perhaps new ones get thrown onto some server whenever
someone likes, and suddenly all the users who install a compiler just
start using that one?

It currently seems that there is no guarantee that on day X, if you
install gcc on Arch (from the Internet) and on the same day you pull in
the source code of pacman package Y, that it will even build with the
Day X version of gcc.  Is that true?

	John



More information about the rb-general mailing list