Two questions about build-path reproducibility in Debian

Vagrant Cascadian vagrant at reproducible-builds.org
Wed Mar 6 18:08:53 UTC 2024


On 2024-03-05, John Gilmore wrote:
> A quick note:
> Vagrant Cascadian <vagrant at reproducible-builds.org> wrote:
>> It would be pretty impractical, at least for Debian tests, to test
>> without SOURC_DATE_EPOCH, as dpkg will set SOURCE_DATE_EPOCH from
>> debian/changelog for quite a few years now.
>
> Making a small patch to the local dpkg to alter or remove the value of
> SOURCE_DATE_EPOCH, then trying to reproduce all the packages from source
> using that version of dpkg, would tell you which of them (newly) fail to
> reproduce because they depend on SOURCE_DATE_EPOCH.

Sure... which brings us to...

>> Sounds like an interesting project for someone with significant spare
>> time and computing resources to take on!
>
> It looks to me like the whole Ubuntu source code (that gets into the
> standard release) fits in about 25 GB.  The Debian 12.0.0 release
> sources fit in 83GB (19 DVD images).  Both of these are under 1% of a
> 10TB disk drive that runs about $200.  A recent Ryzen mini-desktop,
> with a 0.5TB SSD that could cache it all, costs about $300.  Is this
> significant computing resources?  For another $40 we could add a better
> heat sink and a USB fan.  How many days would recompiling a whole
> release take on this $540 worth of hardware?

You also notably left out ram requirements, which is almost more
important than CPU, from what I've seen!

You were not talking about a single pass through the archive, you asked
for a combinatorially explosive comparison (e.g. with and without build
paths, with and without SOURCE_DATE_EPOCH, with and without locale
differences, with and without username variations, etc.) ... and for it
to continue to be useful, you'd have to keep doing it... indefinitely.

Debian currently tests over 25 variations (most of which have actually
resulted in differences in the wild):

  https://tests.reproducible-builds.org/debian/index_variations.html

To systematically identify these "simply" through building each possible
combination for any significant set of software... is a much larger
task. Obviously, you could narrow it to only the set of variations you
want to research, or for a limited package set.

At least for Debian, with what I would guess is significantly more
computing power than you've described, usually did no better than about
30 days from the oldest build, meaning some packages were always
behind. We also blacklist some packges that just take too much ram, disk
or time, though that is considerably less that 1% of ~35k packages. More
importantly, that is with only two builds per package, not testing all
625 permutations of 25 interacting variations per package.


> (I agree that the "spare" time to set it up and configure the build
> would be the hard part. This is why I advocate for writing and
> releasing, directly in the source release DVDs, the tools that would
> automate the recompilation and binary comparison.  The end user should
> be able to boot the matching binary release DVD, download or copy in the
> source DVD images, and type "reproduce-release".)

Automation can help significantly, although at some point you need to
write all that automation, write the code that processes the results
meaningfully, and verify that it is working correctly... and continue to
verify it as new package versions come in, and so on.


In short, easier said than done?


live well,
  vagrant
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 227 bytes
Desc: not available
URL: <http://lists.reproducible-builds.org/pipermail/rb-general/attachments/20240306/ff9ea150/attachment.sig>


More information about the rb-general mailing list