Verification Builds and Snapshots For Debian

Vagrant Cascadian vagrant at reproducible-builds.org
Sat Sep 30 23:59:33 UTC 2023


On 2023-09-20, Lucas Nussbaum wrote:
> On 19/09/23 at 13:52 -0700, Vagrant Cascadian wrote:
>> * Looking forward and backwards at snapshots
>> 
>> I do think that a more complete snapshot approach is probably better
>> than package-specific snapshots, and it might be worth doing
>> forward-looking snapshots of ftp.debian.org (and security.debian.org and
>> incoming.debian.org), in addition to trying to fill out all the missing
>> past snapshots to be able to attempt verification builds of older
>> packages, such as all of bookworm.
>> 
>> Snapshotting the archive(s) multiple times per day, today, tomorrow, and
>> going forward will at least enable doing verification rebuilds of
>> packages starting from this point, with less immediate overhead than
>> trying to replicate the entire functionality or more complete history of
>> snapshot.debian.org.

In the meantime, I worked on a naive implementation of this, using
debmirror and btrfs snapshots (zfs or xfs are other likely candidates
for filesystem-level snapshots). It is working better than I expected!

It currently has snapshots for debian amd64 on bookworm,
bookworm-backports, bookworm-proposed-updates, trixie, sid and
experimental (or I guess, rc-buggy...), and debian-security for
bookworm-security, and this might be a little redundant, but just in
case, also incoming.debian.org for most of the above codenames as well
starting between september 20th and 22nd (with some gaps as I was
sorting out what was worth capturing; currently does not include
debian-installer images, for example, and some generations missed
.udebs). Soon it will start capturing October, and beyond! The machine
it is running on happens to be very close to a debian mirror, which is
helpful! It also seems to have caught some snapshot generations that
snapshot.debian.org missed!

I also tried to backfill out some snapshots from snapshot.debian.org for
"debian" and "debian-security" for roughly the same codenames, with more
success than I expected, capturing all of september and edging into
august so far. Hope to get as far as maybe june, so that anything built
since the bookworm release can has relevent snapshots. It mostly works,
although once and a while I appear to trip some download limits and it
stalls out.

Currently weighing in at about 550GB, each snapshot of the archive for
amd64+all+source is weighing in under 330GB if I recall correctly... so
that is over a month worth of snapshots for the cost of about two full
snapshots. Obviously, adding more architectures would dramatically
increase the space used (Would probably add arm64, armhf, i386, ppc64el
and riscv64 if I were to do this again).


I'm in the process of using this snapshot mirror calling out to
grep-dctrl and dose-builddebcheck (look mom, no database!) to generate
apt sources.list entries pointing to the appropriate snapshots for each
.buildinfo from september, and eventually perform verification builds
for each of these. I think it covers roughly 6000 .buildinfo files,
which is not nothing!


>> I wonder if having multiple snapshot.debian.org implementations might
>> actually be a desireable thing, as it is so essential to the ability to
>> do long-term reproducible builds verification builds, and having
>> additional independent snapshots could provide redundancy and the
>> ability to repair breakages if one of the services fails in some way.
>
> What is the state of efforts regarding alternate snapshot.d.o
> implementations?

The main one I was aware of:

  https://github.com/fepitre/debian-snapshot

I believe snapshot.reproducible-builds.org which used this is currently
on hiatus, but I hope see that picked up again in 2024, possibly with a
different implementation...


> Has someone explored an implementation backed by S3-compatible storage,
> which would easily allow hosting it in a cloud?

No idea, but multiple options would be good! Would probably want to use
a lot of redundancy (multiple S3 providers, multiple "local" mirrors,
etc.), just because this sort of thing is so difficult to fix
retroactively (if possible at all)...

How difficult is it to implement deduplication with S3 storage? Saw a
few hits with a quick search...


live well,
  vagrant
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 227 bytes
Desc: not available
URL: <http://lists.reproducible-builds.org/pipermail/rb-general/attachments/20230930/74d72d56/attachment.sig>


More information about the rb-general mailing list