[EXTERNAL] Re: Reproducible Builds Verification Format

Vagrant Cascadian vagrant at reproducible-builds.org
Fri Jun 5 00:12:24 UTC 2020


On 2020-05-15, Jason Zions via rb-general wrote:
> kpcyrd:
>> The argument was that a debian/arch rebuilder *always* needs to take
>> the buildinfo file as a rebuild input. That's the reason the buildinfo is
>> shipped inside the arch package, collecting detached buildinfo files is a
>> debian thing, but only the buildinfo file for the build that was actually
>> uploaded into the archive is useful for anything.

> This is one of the challenges we face today. The buildinfo file is
> required to rebuild a package. That's fine, most of the time.

I very much like nudging this conversation towards buildinfo files, and
you bring up a very interesting use-case...


> When an upstream team issues a patch (e.g. a fix for a security
> issue), I need to build the updated package immediately and get it
> into the hands of my users. It's often the case that, when I build
> that patched package, there's no buildinfo file yet because a build
> hasn't yet appeared in the Debian repo.

I do find it implausible to reproduce a .deb package without having
identical source, if for no other reason than the debian/changelog file
will contain differences...

So are you suggesting we do per-file comparisons of reproducibility
within the individual .deb files? Off the top of my head, it has some
downsides (more complicated comparison process) and some big upsides
(finer grained comparisons could have higher reproducibility hite rate
for the bits you actually care about). Though it would be a big shift
from the current direction.


> It might be 24-48 hours before that package appears. For security
> patches, that delay is troublesome.

Typically these days, at least the developer's .buildinfo file is
uploaded simultaneously with the source, and not long after for most
architectures, the .buildinfo produced by one of the "buildd" machines.

But, as you're probably aware, the official mirror archive doesn't
publish the .buildinfo files publicly (https://bugs.debian.org/763822),
so we only have services like buildinfos.debian.net and
buildinfo.debian.net (yes, two different sites!) which typically involve
some delays before the .buildinfo lands... it should be in the ballpark
of less than 6 hours, maybe 12 hours in the worst case, but not 24-48
hours.

I dream of someday moving debian towards doing source-only builds and
the .deb files only land in debian if multiple builds successfully
reproduce the build... but that would likely add even further delays to
the binary release!


> In Marrakesh, we talked about the distro maker just being one among
> multiple rebuilders; often the "first one in", but not required to be
> first in. It seems to me that we'd want each rebuilder to
>  - build without an input buildinfo for a given source if none is
>    available from the clearinghouse
>  - record its output buildinfo and checksum information in the
>    clearinghouse

I do very much like this; I prefer models where there is no "official"
.buildinfo files, just a bunch of .buildinfo files produced by various
parties which are essentially attestations of "with source X and
toolchain Y I produced artifacts Z" that can be analyzed for comparison
(ideally in an automated or semi-automated fashion), perhaps by a
"clearinghouse" service like you're suggesting?


I'm a little nervous at all the discussions about "trusted rebuilders"
with a binary reproducible/not reproducible result; it looses a lot of
potential information valuable for diagnosis. That said, if "trusted
rebuilders" means *something* actually gets implemented sooner since it
is simpler to implement, so be it!


> Most of the time, the Debian build process would be first-in; it would
> build without an input buildinfo and record the buildinfo and
> checksums in the clearinghouse. Rebuilders would then rebuild, using
> the recorded buildinfo, and record the checksum they got. Any
> differences would trigger email to all the builders.

> If rebuilders got out ahead of the Debian process, they would build
> (without buildinfo), then record their buildinfo and
> checksums. Multiple rebuilders in parallel might do this; as soon as
> the second rebuilder completes, conflicts would be detected and raised
> to human eyes for resolution.

Again, I'm curious if you mean building the same published .dsc (debian
source code) file as Debian builds, or some locally created variant of
the source?


> Marek:
>>     I have built package <X>, version <Y>, with source hash <AAA> and
>>     got binary package(s) <BBB> with hash <CCC>.
>>         -- signed by (re)builder <RRR>
>> 
>> Other information, like what rebuilder needs to know, or what 
>> environment was used etc could be optional, or even totally separate.
>> And in fact, we do have a format for that extra info already: 
>> buildinfo file. And I think that should be kept separated.

> That's insufficient for the "rebuilders are out ahead of the distro
> maker" scenario I outlined above. Rebuilders who structure their
> rebuild environment to duplicate (as much as possible) the Debian
> environment are likely to produce the same buildinfo file, increasing
> the chance that reproducibility can be demonstrated before Debian puts
> out the "official" build.

It's unlikely you'll produce the "same" .buildinfo file in a bit-for-bit
identical sense, though plausibly a very similar one with insubstantial
differences... or possibly one with significant differences, depending
on timing and state of the archive. With Debian Stable releases and
security updates, this becomes far more likely, since updates are only
targeting security and occasional other infrequent minor updates.


> Also, the end goal isn't merely to detect that a package wasn't
> reproduced; it's to understand *why* it wasn't reproduced. Is this
> some new environment dependency? New code which introduces
> indeterminacy? Supply chain attack? The information in the buildinfo
> is vital to answering that key question. The purpose of the central
> clearinghouse is to enable us to answer that question.

Agreed!


Thanks and apologies for being a bit tardy to the party.

live well,
  vagrant
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 227 bytes
Desc: not available
URL: <http://lists.reproducible-builds.org/pipermail/rb-general/attachments/20200604/933ecece/attachment.sig>


More information about the rb-general mailing list