[EXTERNAL] Re: Reproducible Builds Verification Format
Jason.Zions at microsoft.com
Fri May 15 20:51:50 UTC 2020
The discussion in Marrakesh, and thus the solution we started evolving there, was broader than just "verification format". Participants had different interests and goals:
- increasing the size of the rebuilder community
- improving verification
- - simplifying the process
- - decreasing probability of false verification
- enabling cross-distro verification for "classical" package build environments
- enabling verification for distros with "non-classical" packaging or package distribution environments
Those are just the interests I remember hearing; there may have been more.
To be transparent: with respect to r-b, my goals are focused around rebuilding classical package distros (Debian today, possible expanding more broadly in the future). My team is working in the direction of rebuilding Debian Buster (this year, anyway); a fraction of it today, building towards the entire thing. I need a solution that scales and can be parallelized.
> The argument was that a debian/arch rebuilder *always* needs to take
> the buildinfo file as a rebuild input. That's the reason the buildinfo is
> shipped inside the arch package, collecting detached buildinfo files is a
> debian thing, but only the buildinfo file for the build that was actually
> uploaded into the archive is useful for anything.
This is one of the challenges we face today. The buildinfo file is required to rebuild a package. That's fine, most of the time. When an upstream team issues a patch (e.g. a fix for a security issue), I need to build the updated package immediately and get it into the hands of my users. It's often the case that, when I build that patched package, there's no buildinfo file yet because a build hasn't yet appeared in the Debian repo. It might be 24-48 hours before that package appears. For security patches, that delay is troublesome.
In Marrakesh, we talked about the distro maker just being one among multiple rebuilders; often the "first one in", but not required to be first in. It seems to me that we'd want each rebuilder to
- build without an input buildinfo for a given source if none is available from the clearinghouse
- record its output buildinfo and checksum information in the clearinghouse
- receive an automated notification if another rebuilder recorded a different buildinfo/checksum for the same source
Most of the time, the Debian build process would be first-in; it would build without an input buildinfo and record the buildinfo and checksums in the clearinghouse. Rebuilders would then rebuild, using the recorded buildinfo, and record the checksum they got. Any differences would trigger email to all the builders.
If rebuilders got out ahead of the Debian process, they would build (without buildinfo), then record their buildinfo and checksums. Multiple rebuilders in parallel might do this; as soon as the second rebuilder completes, conflicts would be detected and raised to human eyes for resolution.
> I have built package <X>, version <Y>, with source hash <AAA> and
> got binary package(s) <BBB> with hash <CCC>.
> -- signed by (re)builder <RRR>
> Other information, like what rebuilder needs to know, or what
> environment was used etc could be optional, or even totally separate.
> And in fact, we do have a format for that extra info already:
> buildinfo file. And I think that should be kept separated.
That's insufficient for the "rebuilders are out ahead of the distro maker" scenario I outlined above. Rebuilders who structure their rebuild environment to duplicate (as much as possible) the Debian environment are likely to produce the same buildinfo file, increasing the chance that reproducibility can be demonstrated before Debian puts out the "official" build.
Also, the end goal isn't merely to detect that a package wasn't reproduced; it's to understand *why* it wasn't reproduced. Is this some new environment dependency? New code which introduces indeterminacy? Supply chain attack? The information in the buildinfo is vital to answering that key question. The purpose of the central clearinghouse is to enable us to answer that question.
More information about the rb-general