[rb-general] [jvm] How to share rebuilder attestations

Hervé Boutemy hboutemy at apache.org
Mon Jan 7 09:26:44 CET 2019


Le dimanche 6 janvier 2019, 20:10:50 CET Daniel Shahaf a écrit :
> From: Daniel Shahaf <d.s at daniel.shahaf.name>
> To: General discussions about reproducible builds
> <rb-general at lists.reproducible-builds.org> Cc:
> Bcc:
> Subject: Re: [rb-general] [jvm] How to share rebuilder attestations
> Reply-To:
> In-Reply-To: <4276759.a8cb7drmQk at giga>
> 
> Hervé Boutemy wrote on Sat, Jan 05, 2019 at 08:00:49 +0100:
> > Le jeudi 3 janvier 2019, 11:39:08 CET Daniel Shahaf a écrit :
> > > Hervé Boutemy wrote on Thu, Jan 03, 2019 at 09:21:49 +0100:
> > > > Le mercredi 2 janvier 2019, 13:11:43 CET Arnout Engelen a écrit :
> > > > > Having each successful rebuilder append his signature to a shared
> > > > > .asc
> > > > > would indeed be elegant
> > > > 
> > > > +1
> > > > 
> > > > > when we can expect the .buildinfo for the
> > > > > original build to always be identical to the .buildinfo of any
> > > > > rebuilder.
> > > > 
> > > > yes, what to sign? buildinfo?
> > > > if buildinfo was a pure specification of the instructions to rebuild
> > > > and
> > > > only that, signing them would be ok.
> > > > But buildinfo is currently also a recording of an environment used to
> > > > build = something that I don't expect to be reproducible in the JVM
> > > > world: we're not in a Linux distribution case, I'd like to keep some
> > > > flexibility to rebuild
> > > 
> > > When multiple .buildinfo files attest the same binary artifact, there is
> > > no need for them to be identical; each .buildinfo file *individually*
> > > needs to be cryptographically signed and cryptographically linked to the
> > > artifact it attests, but it's perfectly reasonable to have one binary
> > > artifact backed by six different .buildinfo's and six different
> > > .buildinfo.asc signatures.
> > 
> > do you expect multiple signatures on 1 buildinfo file? ie multiple
> > rebuilders attesting a shared buildinfo file
> > or do you just expect 1 buildinfo file per rebuild?
> 
> The latter.
> 
> More precisely: the latter, but if someone wanted to take a buildinfo
> file, remove from it all information that rebuilders don't need, and
> sign the result, then _that_ minimized version might have multiple
> signatures.
this scenario is highly hypothetical if the buildinfo is thought as a 
recording of current build environment, then generated

I understand that you expect the vast majority to be 1 buildinfo per 
rebuilder, which seems reasonable from a rebuilder perspective: it's the 
attestation sharing solution that will have to deal with the risk to have many 
rebuilders requiring to publish many buildinfo files for 1 artifact (let's be 
optimistic and tell that there will be thousands of rebuilders: it's not for 
now, but that would be a sign of success)

> 
> > > Furthermore, there are benefits to that, too: it can help trace
> > > reproducibility bugs.  For example, suppose six buildinfo files were
> > > generated in 2018, and then a seventh buildinfo appears that was
> > > generated in 2019 and has a different checksum; that could indicate
> > > a missing use of SOURCE_DATE_EPOCH somewhere in the build process.
> > 
> > in such a scenario, instead of sharing a buildinfo in the rebuilders
> > infra,
> > then a requirement for a process to differentiate the buildinfo files that
> > proove the rebuild is ok vs the buildinfo files that prove that there is a
> > reproducibility issue, I'd prefer an issue in a bugtracker
> 
> I think there's a miscommunication somewhere.  Once someone discovers
> that a build process needs to use SOURCE_DATE_EPOCH in more places, a
> bug would be filed about that, of course; but for that to happen,
> someone needs to discover the problem in the first place.  If buildinfo
> files _don't_ indicate the real-world time in addition to the
> SOURCE_DATE_EPOCH value, the problem would be harder to discover in the
> first place, so no bug would ever be filed.
> 
> So, no, buildinfo files aren't a replacement to bug trackers.  However,
> buildinfo files can be designed in such a way that makes some types of
> bugs easier to discover (and file in trackers).  Makes sense?
yes, I now understand where we don't have the same expectation:
to me, if the rebuilder gets a local buildinfo file that does have the same 
output section than the original reference build, this buildinfo file is not 
appended to the attestation log. It goes as attachment to the bugtracker.

but I think you're right: we can't put the expectation to check the buildinfo 
output against the reference to users only, this will absolutely have to be 
checked server side by the attestation gathering process
Then the attestation gathering process will easily make 2 buckets:
1. buildinfo files that match output of original build, then really attest the 
reproduced rebuild binary result
2. buildinfo files that don't match output of original build: here, this will 
be the bucket to improve the build process for better binary reproducibility.

> 
> > > That's from the end user's point of view.  From the
> > > packager's/rebuilder's
> > > point of view, the fact that the .buildinfo contains information not
> > > necessary for reproducing the file is not, by itself, a reason not to
> > > sign it.  For example, if I generate an email message and it contains a
> > > non-standard "X-Composed-On: Debian" header, I would go ahead and sign
> > > it
> > > anyway, since I don't disavow the fact that I run Debian.  Users will
> > > verify the rfc822 message successfully and simply ignore the X-* header
> > > they don't know/care about (as rfc822 says they may).
> > > 
> > > > Why not signing the output artefact?
> > > > = attesting: "I was able to produce such a binary" (and I don't tell
> > > > precisely how my build environment was near from the official one)?
> > > 
> > > The answer to this hinges on whether collisions — i.e., two
> > > *substantively* different .buildinfo files that happen to generate
> > > identical artifacts — are possible.  I can imagine such cases:
> > > 
> > > Suppose somebody releases foo-1.0.0 and then foo-1.0.1 where the only
> > > change is that one of the generated Perl scripts shipped in the binary
> > > package has one more line when binary package is for Linux systems.
> > > Suppose further somebody builds foo-1.0.1 for BSD systems and signs it.
> > > You wouldn't want a Linux user who runs into that artifact to succeed in
> > > verifying it: on BSD the binary artifacts foo-1.0.0 and foo-1.0.1 are
> > > identical, but on Linux they wouldn't be.
> > 
> > I don't really get the example: trying to compare 2 different versions is
> > not about build reproducibility
> > 
> > I really think the key question is: do we expect multiple signatures on
> > one
> > file, whatever this file is (the binary artifact or a buildinfo file
> > representing a typical build environment)? Then a hosting solution that
> > has to append to a .asc file
> > Or never expect multiple signatures, but only new files (then in this
> > scenario, the new files will be buildinfo files, because the reference
> > binary artifact is unique)?
> 
> I think you're mixing design and implementation here.  All these
> discussions about new files / appending to files are implementation
> considerations; there are many alternative ways to do both of these
> things. 
sure, because in general, what we do is a mix of what we dream of and what is 
easily feasible :)
I'll precise that I'm trying to figure out how the workflow will happen for 
binary artifacts published in JVM public repositories (like Maven Central or 
Android Maven Repository): that may influence some ideas
While telling that, the case of Linux distros rebuilding JVM artifacts is 
really a mix that I don't yet manage to see: if someone canshow me how JVM 
artifacts are rebuild for Debian, for example, I'm interested...

> I think the questions, at this point, are:
> 
> - Is a rebuild expected to reproduce the binary artifact verbatim?
AFAIK, the objective of rb is to answer: yes
isn't it?
to avoid manual inspection of the output to decide whether the result is safe 
(just reproducibility bugs) or not (what we really chase = binary that contain 
code that is not in the source)

> 
> - Is a rebuild expected to reproduce the buildinfo file verbatim?
clearly not, since the buildinfo intent is to record the environment used with 
sometimes too precise data (like for example the patch level of the JVM used, 
which in general is not really important AFAIK, or the OS that in general 
should not be important in the JVM case)

> 
> - What exactly gets PGP-signed?  (The binary artifact?  The buildinfo?
>   If the latter, how does one then establish trust in the binary
>   artifact?)
good question:
the rebuilders's buildinfo, for sure, gets signed by the rebuilder
Signing the binary artifact could make sense, but the workflow for that may 
not be easy...
Signing the original buildinfo file to me does not really make sense: if we 
sign an existing file, IMHO it's better to go with the binary artifact

Regards,

Hervé

> 
> Cheers,
> 
> Daniel
> _______________________________________________
> rb-general at lists.reproducible-builds.org mailing list
> 
> To change your subscription options, visit
> https://lists.reproducible-builds.org/listinfo/rb-general.
> 
> To unsubscribe, send an email to
> rb-general-unsubscribe at lists.reproducible-builds.org.






More information about the rb-general mailing list