Reproducible Builds Verification Format

Tue May 12 21:44:17 UTC 2020

Some of these dreams and the outlines of these concepts have been around 
quite a bit longer than this year, even.  I think some differential 
diagnosis about what makes this draft different, and why it makes the 
choices it does, would be useful.

Some things I'd like to see identified and explicitly discussed more 
frequently in this concept space:

- What's the "primary key"?  In other words, how can I meaningfully 
expect to identify this one attestation record, or this one build 
instruction document?

- What are the "secondary keys" I could plausibly expect to select on if 
I have a zillion of these, and want to find those that should or should 
not align in results?

- What parts of this info do we expect to be useful, and why?  (What 
user story caused a certain piece of info to seem relevant and 
actionable enough to include?)

- What things we *could* imagine someone proposing putting in this info 
which we might reject because we don't believe it would be useful, and why?

The motivations of "a generic way to compare results" are good.  But 
good intentions can only carry us so far.  These four things are some of 
the first considerations I have when looking at a format proposal.  
Without some thought about the "keys", I don't know how it will deliver 
on "comparability" at scale.  Without some meta-documentation of not 
just the data that goes _in_, but also the kind of data that _doesn't_, 
I worry that the spec will become a kitchen sink, sopping up more data 
with time regardless of its relevance, and correspondingly becoming less 
and less useful over time.

I don't know if these are the only four questions to ask, nor will I 
claim they are perfect, but they're some of the first things that come 
to my mind as heuristics, and I share them in the hope that they can be 
a useful whetstone for someone else's thoughts.

As an incidental aside, I think what's currently listed in that github 
link as "origin_uri" may be mistaken in its conception of "URI".  The 
examples are such things as "http://ftp.us.debian.org/" and 
"https://download.docker.com/", and I'm sure these are _locations_, not 
_identifiers_ -- URLs, not URIs.

And I would question (begging forgiveness from anyone who knows my 
refrain already) if "locations" as any sort of primary key are a sturdy 
idea to try to build upon.  They're terribly centralized. And provide 
very little insurance against mutability events which can make all other 
documents that refer to them become instantly useless.  
Content-addressing may have some potential to address this, git (at 
least in concept) has shown us the way...

Cheers to all hopeful rebuilders :)

On 5/12/20 11:00 PM, Paul Spooren wrote:
> Hi all,
>
> at the RB Summit 2019 in Marrakesh there were some intense discussions about
> *rebuilders* and a *verification format*. While first discussed only with
> participants of the summit, it should now be shared with a broader audience!
>
> A quck introduction to the topic of *rebuilders*: Open source projects usually
> offer compiled packages, which is great in case I don't want to compile every
> installed application. However it raises the questions if distributed packages
> are what they claim. This is where *reproducible builds* and *rebuilders* join
> the stage. The *rebuilders* try to recreate offered binaries following the
> upstream build process as close as necessary.
>
> To make the results accessible, store-able and create tools around them, they
> should all follow the same schema, hello *reproducible builds verification
> format* (rbvf). The format tries to be as generic as possible to cover all open
> source projects offering precompiled source code. It stores the rebuilder
> results of what is reproducible and what not.
>
> Rebuilders should publish those files publicly and sign them. Tools then collect
> those files and process them for users and developers.
>
> Ideally multiple institutions spin up their own rebuilders so users can trust
> those rbuilders and only install packages verified by them.
>
> The format is just a draft, please join in and share you thoughts. I'm happy to
> extend, explain and discuss all the details. Please find it here[0].
>
> As a proof of concept, there is already a *collector* which compares upstream
> provided packages of Archlinux and OpenWrt with the results of rebuilders.
> Please see the frontend here[1].
>
> If you already perform any rebuilds of your project, please contacy me on how to
> integrate the results in the collector!
>
> Best,
> Paul
>
>
> [0]: https://github.com/aparcar/reproducible-builds-verification-format
> [1]: https://rebuild.aparcar.org/
>