Reproducible OS images

John Gilmore gnu at toad.com
Fri Mar 28 22:56:26 UTC 2025


Vagrant Cascadian <vagrant at reproducible-builds.org> wrote:
> > During development, the code would be built by some earlier release's
> > tools, built piecemeal, etc, like current build processes do.  Anytime
> > before release, the developers can test whether a draft source release
> > builds into a binary release that itself can build the sources into the
> > same binary release.  And fix any discrepancies, ideally long before
> > release.
> 
> Doing this for a specific software project is one thing, doing this for
> a hugely complicated intersection of interdependent build
> dependencies... it would be nice, but not sure Debian is up to the
> challenge in a reasonable timeframe. We do not even necessarily get
> every package rebuilt every ~2 year release cycle, as packages are
> maintained by a handful of volunteers...

Thanks for exploring this with me.  I'm confused.  Is this a simple
matter of lack of CPU cycles, or lack of storage?  Surely those are both
cheap and plentiful these days.  Or a lack of automation?

Why can't a script enumerate every package in Debian, and then rebuild
each, running under a booted binary alpha OS release distribution?  Then
reassemble a new alpha OS release from the results.  Then boot it, and
do that process one more time to check that it's reproducible.
Investigate any place that isn't 100% identical, lather, rinse, and
repeat.

By the time you'd get to the beta OS release, all the easily fixed
glitches should be fixed.  And you'd have a list of tough
reproducibility fixes, too tough to immediately fix, that would go into
the task list for developers to handle before the next major OS release.

> Even if we narrowed it down to a minimal Debian base system that is
> still well over 5000 packages when you account for build dependencies
> alone ... multiplied across 9+ architectures (some of which are a bit
> sluggish)... *sigh*

If the average package takes 20 seconds to build on a fast PC, then
building 5000 packages would take about 27 hours.  If doing this on 9
architectures, use 9 independent PCs, there's no dependency among them.
If the laggards take too long, most of the problems will have been
caught by the fast architectures, and the laggards can be eventually
finished by anybody out in the world whose slow machine has a month or
two to spend.

At Cygnus we had some sluggish architectures to deal with too.  We
cross-compiled those binary releases from a fast architecture, and then
verified that (1) the produced binaries ran correctly (slowly) in the
sluggish arch's (we wrote test suites using DejaGnu), and (2) that the
binaries produced by cross-compiling from different architectures were
identical.  Some of our target architecture boards didn't have an OS;
they only supported serial-port download of each test, and serial upload
of the results from running it; we automated that, too.  Debian has an
advantage there: the target has an OS and storage, and probably a gigabit
Ethernet (I recommend 10-gig if such a board exists) back to the host.

Getting all the dependencies right for cross-compiling was finicky, we
had to get all the include files and binary libraries into the build
environment for the target architecture.  But we automated it with
scripts as we proceeded.  Repeated testing, and learning from our
mistakes, produced scripts that didn't have glitches.  And those scripts
worked, with minor tweaks, for subsequent releases.

Those automated build scripts would be part of the OS's source code --
like Makefiles are part of the source code of packages.  And anyone with
a PC at home could run the scripts that had arrived in the source-code
release, rebuilding the world on their own machine, to verify that the
release really is reproducible.

> That said, it sure would be good to try!

Let us know how it goes!

	John
	


More information about the rb-general mailing list