Arch Linux minimal container userland 100% reproducible - now what?

HW42 hw42 at ipsumj.de
Fri Mar 29 20:21:15 UTC 2024


John Gilmore:
> kpcyrd <kpcyrd at archlinux.org> wrote:
>> 1) There's currently no way to tell if a package can be built offline 
>> (without trying yourself).
> 
> Packages that can't be built offline are not reproducible, by
> definition.  They depend on outside events and circumstances
> in order for a third party to reproduce them successfully.
> 
> So, fixing that in each package would be a prerequisite to making a
> reproducible Arch distro (in my opinion).

I don't agree. For example the r-b.o [1] definition doesn't mandate who
needs to archive what. We probably can agree that we mean a "verifiable
path from source to binary code" (and not just repeatability, which is
also sometimes meant by reproducible builds in other contexts), but
beyond that the details and motivations will be different depending on
who you ask.

To be clear I don't say what you like to see is not worthwhile. Actually
I'm very sympathic to such archiving goals. But if Arch Linux, as
kpcyrd's mails suggest, right now just want to verify their builder
output soon-ish after upload that's fine too and can be called
reproducible, in my opinion.

[1]: https://reproducible-builds.org/docs/definition/

> I don't understand why a "source tree" would store a checksum of a
> source tarball or source file, rather than storing the actual source
> tarball or source file.  You can't compile a checksum.

How distros store their source code is different, due to different
needs, historic circumstances, etc.. And the approach of just having the
packaging definition and patches and then referring the "original" source
is common and I certainly see the advantages.

> kpcyrd <kpcyrd at archlinux.org> wrote:
>> Specifically Gentoo and OpenBSD Ports have solutions for this that I 
>> really like, they store a generated list of URLs along with a 
>> cryptographic checksum in a separate file, which includes crates 
>> referenced in e.g. a project's Cargo.lock.
> 
> I don't know what a crate or a Cargo.lock is,

It's Cargo's (Rust' package/dependency manager) way to pin specific
dependencies, including hashes of those.

> but rather than fix the problem at its source (include the source
> files), you propose to add another complex circumvention alongside the
> existing package building infrastructure?  What is the advantage of
> that over merely doing the "cargo fetch" early rather than late and
> putting all the resulting source files into the Arch source package?

I'm not an Arch developer, but probably because a package source repo
like [2] is much easier for them to handle than if they would commit the
source of all (transitive) dependencies [3].

[2]: https://gitlab.archlinux.org/archlinux/packaging/packages/rage-encryption
[3]: https://github.com/str4d/rage/blob/v0.10.0/Cargo.lock

(Note that Arch made the, for a classic Linux distro currently rather
unusual, decision to build Rust programs with the exact dependencies
upstream has defined and not separately package those libraries.)

>> 3) All of this doesn't take BUILDINFO files into account
> 
> The BUILDINFO files are part of the source distribution needed
> to reproduce the binary distribution.  So they would go on the
> source ISO image.
> 
>> I did some digging and downloaded the buildinfo files for each package 
>> that is present in the archlinux-2024.03.01 iso
> 
> Thank you for doing that digging!
> 
>>                           Using plenty of different gcc versions looks 
>> annoying, but is only an issue for bootstrapping, not for reproducible 
>> builds (as long as everything is fully documented).
> 
> I agree that it's annoying.  It compounds the complexity of reproducing
> the build.  Does Arch get some benefit from doing so?
> 
> Ideally, a binary release ISO would be built with a single set of
> compiler tools.  Why is Arch using a dozen compiler versions?  Just to
> avoid rebuilding binary packages once the binary release's engineers
> decide what compiler is going to be this release's gold-standard
> compiler?  (E.g. The one that gets installed when the user runs pacman
> to install gcc.)  Or do the release-engineers never actually standardize
> on a compiler -- perhaps new ones get thrown onto some server whenever
> someone likes, and suddenly all the users who install a compiler just
> start using that one?

If you look at classic Linux distros it's the norm to iteratively add
packages to your repo and build new packages with what is in the
(development) repo at this time. So a single snapshot of a repo will in
nearly all cases not contain all versions to reproduce the packages in
that snapshot.

You will find this in Arch, Debian, Fedora, .... Some other distros like
Yocto, might make different decisions, but those are rather the
exception.

> It currently seems that there is no guarantee that on day X, if you
> install gcc on Arch (from the Internet) and on the same day you pull in
> the source code of pacman package Y, that it will even build with the
> Day X version of gcc.  Is that true?

As described above for the rolling development repo of most distros
that's true.

> [from a previous mail:]
> If someday an Electromagnetic Pulse weapon destroys all the running
> computers, we'd like to bootstrap the whole industry up again, without
> breadboarding 8-bit micros and manually toggling in programs.  Instead,
> a chip foundry can take these two ISOs and a bare laptop out of a locked
> fire-safe, reboot the (Arch Linux) world from them, and then use that
> Linux machine to control the chip-making and chip-testing machines that
> can make more high-function chips.  (This would depend on the
> chip-makers keeping good offline fireproof backups of their own
> application software -- but even if they had that, they can't reboot and
> maintain the chip foundry without working source code for their
> controller's OS.)

In such a case reproducible builds in the sense of ensuring that you can
be sure that a binary matches the source is actually not important (But
can be convenient to check your recovery build environment).

Simon
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.reproducible-builds.org/pipermail/rb-general/attachments/20240329/de8ff739/attachment.sig>


More information about the rb-general mailing list