[rb-general] Reproducible Java builds with Maven

Eric Myhre hash at exultant.us
Mon Nov 26 16:21:33 CET 2018


On 26.11.2018 03:00, Bernhard M. Wiedemann wrote:
> Hi Hervé,
>
> thanks for raising this topic.
>
> On 26/11/2018 09.08, Hervé Boutemy wrote:
>> Anybody interested in working together?
> With openSUSE we are doing all builds offline to ensure that we can
> repeat builds later (without worry about offline or hacked servers), but
> for maven this often meant we had to download 300 MB of someone else's
> binaries to use in the build.

I love all the reproducibility issues of jars enumerated in this wiki page.

However... another +1 to this issue raised by Bernhard and Julien. One 
of the biggest practical hurdles in working with Maven comes before any 
of that: there's no clear separation of "download time" vs "resolve 
time" vs "build time".

Maven seems to intermix downloads and execution operations fairly freely 
(e.g. plugin download, now plugin eval, now dep download -- download and 
execution are interleaved).  This makes it very, very difficult to 
ensure all the needed dependencies can be identified and downloaded (and 
saved locally) in advance.

Some distributions and build environments prefer to completely disable 
the network during builds in order to make certain that there aren't 
uncaptured information sources or dependencies being downloaded at build 
time -- in order to make rigorously sure we satisfy our core definition 
of reproducible: "given the same source code, [and] build environment".  
I'd love to work on making Maven as compatible with this goal as possible.

Even some features for more explicit/pre-build-phase dependency 
enumeration would be a big help in this area. I chatted with some other 
Maven enthusiastic folk at our last summit, and while we found ways to 
instruct Maven to yield a list of resolved dependencies, this still 
didn't cover a lot of critical ground: the output was human-readable, 
but not very easily machine-parsible; and if I recall correctly it 
covered dependencies but not plugins, making it somewhat incomplete.  An 
API for these operations would be incredibly useful.  (And then ideally, 
perhaps we'd like a way to take our resolved list of dependencies and 
automatically write out a new pom file with either those fixed versions 
or a fixed reference to everything needed to perform an identical 
resolution process offline in the future; but that's a next step.  
Sounds like Guix has a tool for that; it'd be nice if such a tool was in 
mainline Maven itself.)

Of course if I'm misspeaking and there are more features for dependency 
enumeration and separating download/resolve/build phases -- I love being 
wrong :) -- then this whole email can instead be: I'd love to round up 
some documentation about these features and add it to these wiki pages 
about reproducibility :)

---

https://github.com/signalapp/gradle-witness might be interesting in 
relation to this topic.  It is a Gradle plugin to add hash checks to 
downloads.

It ran into a few issues that seem likely to arise again:

- It's very opt-in; you can't apply it to a project without modifying 
the pom^H^H^H build.gradle file, and this limits its usefulness to folk 
from the distro perspective

- As the readme mentions, it has something of a bootstrapping problem 
(it can't fetch *itself* by hash...)

- IIUC, it doesn't work for Maven/Gradle plugins, only for the project 
dependencies... which means it's not a complete coverage of the build 
environment.

    - It only applies the checks to dependencies listed in the
    configuration; if transitive resolution somehow adds a new
    dependency, it goes unchecked (and this does come up: for example,
    if building on a different architecture, the dependency resolution
    may yield different results *even when* all versions are pinned),
    and so again, it's not complete coverage.


In general, the lesson here seems to be that when trying to get a 
complete view of the sources and build environment, tools built into the 
core can really can shine a lot brighter; when trying to do it in 
plugins, then things like (ironically) plugins seem to end up very 
difficult to handle.

---

Cheers!  Very excited for the gathering of effort.




More information about the rb-general mailing list