[rb-general] Reproducible Java builds with Maven

Julien Lepiller julien at lepiller.eu
Mon Nov 26 18:08:49 CET 2018

Le 2018-11-26 16:21, Eric Myhre a écrit :
> On 26.11.2018 03:00, Bernhard M. Wiedemann wrote:
>> Hi Hervé,
>> thanks for raising this topic.
>> On 26/11/2018 09.08, Hervé Boutemy wrote:
>>> Anybody interested in working together?
>> With openSUSE we are doing all builds offline to ensure that we can
>> repeat builds later (without worry about offline or hacked servers), 
>> but
>> for maven this often meant we had to download 300 MB of someone else's
>> binaries to use in the build.
> I love all the reproducibility issues of jars enumerated in this wiki 
> page.
> However... another +1 to this issue raised by Bernhard and Julien. One
> of the biggest practical hurdles in working with Maven comes before
> any of that: there's no clear separation of "download time" vs
> "resolve time" vs "build time".
> Maven seems to intermix downloads and execution operations fairly
> freely (e.g. plugin download, now plugin eval, now dep download --
> download and execution are interleaved).  This makes it very, very
> difficult to ensure all the needed dependencies can be identified and
> downloaded (and saved locally) in advance.
> Some distributions and build environments prefer to completely disable
> the network during builds in order to make certain that there aren't
> uncaptured information sources or dependencies being downloaded at
> build time -- in order to make rigorously sure we satisfy our core
> definition of reproducible: "given the same source code, [and] build
> environment".  I'd love to work on making Maven as compatible with
> this goal as possible.
> Even some features for more explicit/pre-build-phase dependency
> enumeration would be a big help in this area. I chatted with some
> other Maven enthusiastic folk at our last summit, and while we found
> ways to instruct Maven to yield a list of resolved dependencies, this
> still didn't cover a lot of critical ground: the output was
> human-readable, but not very easily machine-parsible; and if I recall
> correctly it covered dependencies but not plugins, making it somewhat
> incomplete.  An API for these operations would be incredibly useful. 
> (And then ideally, perhaps we'd like a way to take our resolved list
> of dependencies and automatically write out a new pom file with either
> those fixed versions or a fixed reference to everything needed to
> perform an identical resolution process offline in the future; but
> that's a next step.  Sounds like Guix has a tool for that; it'd be
> nice if such a tool was in mainline Maven itself.)

Sorry, I've expressed myself wrongly, Guix doesn't have a tool
yet to override dependencies, but that's something we'll do
once we have the necessary plugins. I'm actually stuck at
building dependencies of plugins, so I'm far from having
started the work of implementing the maven-build-system.

This doesn't sound too difficult though: it's just reading
the pom.xml of the inputs (build dependencies as declared in
the guix package) to extract their name and version, then
override the versions declared in the package being built.

I don't really see how that could be implemented as part of
maven itself, though, because it needs information about the
build environment. What if you have multiple version of the
same package available because of a bootstrap dependency?
How would maven choose?

> Of course if I'm misspeaking and there are more features for
> dependency enumeration and separating download/resolve/build phases --
> I love being wrong :) -- then this whole email can instead be: I'd
> love to round up some documentation about these features and add it to
> these wiki pages about reproducibility :)
> ---
> https://github.com/signalapp/gradle-witness might be interesting in
> relation to this topic.  It is a Gradle plugin to add hash checks to
> downloads.
> It ran into a few issues that seem likely to arise again:
> - It's very opt-in; you can't apply it to a project without modifying
> the pom^H^H^H build.gradle file, and this limits its usefulness to
> folk from the distro perspective
> - As the readme mentions, it has something of a bootstrapping problem
> (it can't fetch *itself* by hash...)
> - IIUC, it doesn't work for Maven/Gradle plugins, only for the project
> dependencies... which means it's not a complete coverage of the build
> environment.
>    - It only applies the checks to dependencies listed in the
>    configuration; if transitive resolution somehow adds a new
>    dependency, it goes unchecked (and this does come up: for example,
>    if building on a different architecture, the dependency resolution
>    may yield different results *even when* all versions are pinned),
>    and so again, it's not complete coverage.
> In general, the lesson here seems to be that when trying to get a
> complete view of the sources and build environment, tools built into
> the core can really can shine a lot brighter; when trying to do it in
> plugins, then things like (ironically) plugins seem to end up very
> difficult to handle.
> ---
> Cheers!  Very excited for the gathering of effort.

More information about the rb-general mailing list