Arch Linux minimal container userland 100% reproducible - now what?

Wed Apr 3 11:13:47 UTC 2024

On Tue, 2024-04-02 at 10:11 -0700, John Gilmore wrote:
> James Addison wrote that local storage can contain errors.  I agree.
> 
> > My guess is that we could get into near-unsolvable philosophical territory
> > along this path, but I think it's worth being skeptical of the notions that
> > local-storage is always trustworthy and that the network should always be
> > avoided.
> 
> For me, the distinction is that the local storage is under the direct
> control of the person trying to rebuild, while the network and the
> servers elsewhere in the network are not.  If local storage is
> unreliable, you can fix or replace it, and continue with your work.
> 
> I am looking for reproducibility that is completely doable by the person
> trying to do it, at any time after when they obtain a limited number of
> key items by any means: the bootable binary of the OS release, and what
> the GPL calls the "Corresponding Source".
> 
> And, I am very happy to be seeing lots of incremental progress along the way!

FWIW Yocto Project/OpenEmbedded is able to do something like this.

The builds are "cross" and sufficiently isolated from the host that the
host OS doesn't influence the output. By that I mean we build a cross
compiler and then use the cross compiler to build the target. 

Whilst the intermediate cross compiler may differ bitwise depending on
the host compiler, the generated target output should always be the
same. I say "should" as there can be theoretical contamination sources
but we test this on our infrastructure with diverse hosts (Debian,
Ubuntu, Fedora, Alma, Rocky and OpenSUSE systems of differing versions)
and check we always get the same output. This is what our reproducible
claim is measuring, that this output doesn't differ between those
systems.

The build system doesn't allow network access outside the initial
"fetch" step and it verifies some form of checksum of every external
source input.

The inputs can be fetched from their upstream location, or from a
mirror. The project maintains a mirror but users can also have a local
one of their own. Since the inputs are checksum verified, it doesn't
really matter where.

So the things needed to build a given output are:

* the metadata (build instructions)
* the build system itself
* sources or a sources mirror (which is verified against the metadata)
* some kind of host to run the build

For the host to run the build, it can be an off the shelf
ubuntu/debian/fedora/whatever or it can also be one of our own output
images, leading to effective self hosting.

Each of the above things are things which someone can easily archive
and restore without significant issue or knowledge.

This can therefore all be done by anyone, meaning someone building a
product using embedded linux (our target users) can rebuild their
output incorporating any security fixes needed for example, years from
now.

I'd note this isn't theoretical, there are companies doing this today
using the self hosting images so there isn't a dependency on any other
distro either.

Cheers,

Richard