Verification Builds and Snapshots For Debian

Tue Sep 19 20:52:24 UTC 2023

I experimented with verification builds building packages that were
recently built by the Debian buildd infrastrcture... relatively soon
after the .buildinfo files are made available, without relying on
snapshot.debian.org... with the goal of getting bit-for-bit identical
verification of newly added packages in the Debian archive.

Overall, I think the results are promising and we should actually try
something kind of like this in a more systematic way!

Fair warning, this has turned into quite a long email...

* Background

For the most part in Debian, we have been doing CI builds, where a
package is built twice and the results compared, but it is not verifying
packages in the official Debian archive. It is useful, especially for
catching regressions in toolchains and such, but verifying the packages
people actually use is obviously desireable.

In order to actually perform a verification build, you need the exact
same packages installed in a build environment...

There was a beta project performing verification builds that appears to
have stalled sometime in 2022:

  https://beta.tests.reproducible-builds.org/

From what I recall, one of the main challenges was the reliability of
the snapshot.debian.org service which lead to the development of an
alternative snapshotting service, although that is currently not yet
completed...

At some point, debsnapshot was used to perform some limited testing, but
this was also dependent on a reliable snapshot.debian.org.

There have been several other attempts are rebuilders for debian, but
the main challenge usually seems to come down to a working snapshot
service in order to be able to sufficiently reproduce the build
environment a package was originally built in...

* Summary of approach for this experiment

Copy a .buildinfo file from either
coccia.debian.org:/srv/ftp-master.debian.org/buildinfo/2023/09/16 or
https://buildinfos.debian.net/ftp-master.debian.org/buildinfo/2023/09/16
or other dates, but something fairly recent for best results...

Create a package-specific snapshot of all the exact versions of packages
in the .buildinfo file (Installed-Build-Depends).

Build a package with the exact versions from the .buildinfo file added
as build-dependencies, with the package-specific snapshot added to
available repositories(as well as a bunch of others), leveraging "sbuild
--build-dep-resolver=aptitude" to resolve the potentially complicated
build dependencies.

This supports sid and experimental reasonably well, including binNMUs.
It also supports the few bookworm-proposed-updates and
bookworm-backports .buildinfo files to some degree. Not sure where to
get .buildinfo files from debian-security, but would love to test those
as well! In theory it supports trixie as well, but nearly all packages
for trixie currently get built in sid/unstable rather than directly in
trixie.

I found that building sid and experimental worked best starting with a
slightly out-of-date trixie tarball, as it was almost always easier to
upgrade packages than to downgrade. Currently bookworm-proposed-updates
and bookworm-backports are fairly stable, although possibly the same
issue might apply.

* Package specific snapshots vs. complete snapshots

I have mixed feelings on the package-specific snapshots. It solves the
problem of getting old versions of packages to verify the build (or at
least could, with a bit more work), but with some drawbacks (custom apt
keyring, redundant information in *many* little snapshots, kind of
complicated).

Having explored package-specific snapshots, I think a better approach
might be to make forward-looking snapshots of ftp.debian.org,
incoming.debian.org and ideally security.debian.org (in addition to
snapshot.debian.org or a replacement)...

With locally available complete snapshots, each .buildinfo can be
processed as soon as possible to find the list of snapshots that would
satisfy the dependencies (to reduce the likelihood of having to rummage
through older snapshots to find dependencies)... and make an addendum to
the .buildinfo file that includes enough information to fully resolve
all the build dependencies... allowing the build to be performed at some
other time. This addendum might also need to recommend a snapshot for
the build chroot or base tarball, though that might be a bit trickier.

This could avoid having to leverage something like metasnap.debian.net,
that can process a .buildinfo and spit out the relevent sanpshots.

* The Code

My proof of concept collection of scripts, configuration and and total
lack of documentation:

  https://salsa.debian.org/reproducible-builds/debian-verification-build-experiment

In retrospect, I should clearly have started by poking more at
debrebuild and other prior art... oops!

This also did not handle the syncing of the .buildinfo files at all,
which I did manually for this experiment, but that is a fairly
straightforward problem, and buildinfos.debian.net does this already.

* Some actual results!

Testing only arch:all and arch:amd64 .buildinfos, I had decent luck with
2023/09/16:

total buildinfos to check: 538
attempted/building: 535

unreproducible: 28      5 %
reproducible:   461     85 %
failed:         46      8 %
unknown:        3       0 %

Overall, reasonable results. This day had a quite large number of
.buildinfos to process relative to most days (most days are below 300,
more below). I have not verified that these packages actually match the
checksums of .deb packages in the archive, but they match the
-buildd.buildinfos which is close enough for now.

There may be a small amount of double-counting for builds that were for
one reason or another performed multiple times, potentially marked as
multiples of failed, reproducible and unreproducible. And probably other
smallish discrepancies... but the overall numbers seem representative.

Some of the failures were due to missing or unresolvable
build-dependencies, some just regular build failures. Recent version
changes of of glibc, binutils, and gcc* caused some build-dependency
resolution problems.

The unknown are simply the discrepancy from how many performed builds
vs. how many *-buildd.buildinfos were available for that day.

I also had similar results for 2023-09-15 and 2023-09-17, but ... this
morning most of those results myseriously disappeared!?! No idea what
happened to them. I had also done some earlier testing before I settled
on this particular approach, but was still getting reasonably good
results with those earlier experiments too.

* Partially reproducible?

A significant number of source packages produce multiple binary
packages, of which frequently some of those are reproducible, even if
all of them are not. It would be worth tracking that, as people do not
necessarily use all the binary packages of a given source package.

I still want to someday make a partial mirror using packages that were
successfully reproduced and matching the ones in the official
archive... as a very inefficient and unreliable rsync implementation!

* Number of .buildinfos per day

When I started this experiment, I thought of focusing only on a reduced
set of debian, but quickly realized that a moderately powerful machine
or two can usually handle the workload of all of the .buildinfos
produced on a given day.

Just to get an idea of how many builds per day this is, looking at the
number of the *(all|amd64)-buildd.buildinfos per day since 2021, the
vast majority of days have 299 or fewer buildinfos per day (888 days out
of 990 days), with 599 or fewer being most of the of the remaining days
(93), and a handful of days with more builds (8).

Our current CI tests.reproducible-builds.org amd64 builders test
thousands of packages per day most days, and that is building each
package twice.

I excluded builds that were performed by maintainers, as they do not
migrate to testing, are a small minority of .deb related uploads, and
are probably trickier to validate (e.g. arbitrary build paths, built on
arbitrary days in the past due to NEW processing delays, possibly
arch:amd64+all builds, etc.) ... maybe important to validate for some of
the same reasons, but outside the scope for now.

* Time Troubles

One concern I have is that by building relatively close in time, it may
produce false positives for general reproducibility due to building in
the same year, month or day. I am not sure of the value of a
verification build that can only be verified if performed in the same
day, month or year. I guess verification builds could be retried at a
later date to be more sure with some sort of snapshot service.

Since it is hard to control the time in the build environment, building
in a VM with a future clock (+398 days) could workaround this to get
more confident results of reproducibility, although that may trigger
other time related failures.

* Package-specific caveats and doubts

The little package-specific snapshots for each .buildinfo do not
recursively resolve the dependencies, instead relying for the most part
on those landing in the official archive. With a bit more work, I
suspect those dependencies could get fully resolved in these
package-specific snapshots and it could be made more reliable... but I
also think this might be the wrong approach.

A big downsides to this approach is that it requires trusting another
apt keyring, as these package-specific snapshots are not signed by the
official debian builders.

One of the big advantages is that packages may depend on versions from a
mix of ftp.debian.org dinstall runs, due to also pulling in packages
from incoming.debian.org. But maybe that can be resolved in other ways.

For someone to be able to independently verify these builds, these
package-specific snapshots would need to be published somehow.

* Looking forward and backwards at snapshots

I do think that a more complete snapshot approach is probably better
than package-specific snapshots, and it might be worth doing
forward-looking snapshots of ftp.debian.org (and security.debian.org and
incoming.debian.org), in addition to trying to fill out all the missing
past snapshots to be able to attempt verification builds of older
packages, such as all of bookworm.

Snapshotting the archive(s) multiple times per day, today, tomorrow, and
going forward will at least enable doing verification rebuilds of
packages starting from this point, with less immediate overhead than
trying to replicate the entire functionality or more complete history of
snapshot.debian.org.

I wonder if having multiple snapshot.debian.org implementations might
actually be a desireable thing, as it is so essential to the ability to
do long-term reproducible builds verification builds, and having
additional independent snapshots could provide redundancy and the
ability to repair breakages if one of the services fails in some way.

* In closing...

To me it seems viable to successfully do verification builds of most of
the packages recently built.

There are approaches to do published snapshots of the archive that would
make it possible to verify these builds at a later time by independent
third parties.

Everything is always harder than it looks, but maybe we can get *some*
real-world verification sooner than later? :)

live well,
  vagrant
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 227 bytes
Desc: not available
URL: <http://lists.reproducible-builds.org/pipermail/rb-general/attachments/20230919/c18ecff7/attachment.sig>