Reproducing a Maven Central Release from a single GAV coordinate
Hervé Boutemy
hboutemy at apache.org
Tue Sep 2 05:52:35 UTC 2025
going directly from one random gav to the project that built it from source is
not easy, as there are many cases (many build tools, many project structures,
many ways to build one project)
but once some projects have been built and their results analyzed, as I did
for Reproducible Central, what started with "build a project (from source) and
list output files/gavs" can be reversed to go from gav to source
= what I did and made easily accessible through "Reproducible Central
Artifact" badge https://shields.io/badges/reproducible-central-artifact
that can later be used in a report on dependencies like "Reproducible Central
Report"
for example
https://cyclonedx.github.io/cyclonedx-maven-plugin/reproducible-central.html
I feel it is a little bit what you are trying to do
I can dig more into details on this report, particularly the difference between
"groupId:artifactId unknown", "version not evaluated" and a score "<RB ok
files> / <output files count when rebuilding>", if you think it's useful
Regards,
Hervé
Le lundi 1 septembre 2025, 19:53:28 CEST Aman Sharma via rb-general a écrit :
> Hi Yasser,
>
>
>
> > Given only a single GAV (e.g., G1:A1:V1), is there a reliable way
> > (tool/technique) to determine the complete set of GAVs that were
> > published as part of the same upstream release? In other words, starting
> > with G1:A1:V1, how do I discover “all other GAVs that were released
> > together with it” so I can compare them with the outputs of my local
> > build?
>
>
> I do understand your question better now. Thanks for elaboration :)
>
> I had the same question when I was analyzing data in the reproducible
> central dataset. I wanted to present the reproducibility per GAV and not
> per source project. Thus, I needed to split the rebuilt artifacts from a
> source project into multiple GAVs.
>
>
> > Inspect the POM of G1:A1:V1:
>
>
> * Walk up the parent POM chain to locate the reactor root or a POM with
> a section.
* From the root POM, enumerate modules and map them to
> expected GAVs at that version, then verify presence on Central (to exclude
> reactor-only or non-published modules).
>
> I built a tool
> maven-module-graph<https://github.com/chains-project/maven-module-graph>
> that does this. It takes in the root pom (along with the entire source of
> Maven project) and returns all the submodules in the project. I used this
> tool first get a list of GAVs and then based on the artifact ID, I mapped
> the unreproducible artifact which looks something like
> this<https://github.com/chains-project/reproducible-central/blob/master/jav
> a/unreproducible_gradle_projects_to_releases.json>. Here are the commands to
> try it out.
>
> ./gradlew build
> java -jar build/libs/maven-module-graph-1.0-SNAPSHOT.jar \
> --project-root <path/to/maven/project/root> \
> --json <path/to/output.json> \
> --plain-text <path/to/output.txt>
>
>
>
> I have motivated in the README why I built this tool, but basically starting
> up the maven reactor to discover all the submodules was too slow for me
> (even though it would be more correct in edge cases such as pom file named
> differently than pom.xml).
>
>
> > then verify presence on Central (to exclude reactor-only or non-published
> > modules).
>
>
> I include all the GAVs in all profiles by default. If you want to exclude,
> add `--exclude-profiles`. However, verification of presence is indeed a
> good heuristic.
>
> Regards,
> Aman Sharma
>
> PhD Student
> KTH Royal Institute of Technology
> School of Electrical Engineering and Computer Science (EECS)
> Department of Theoretical Computer Science (TCS)
> <http://www.kth.se><https://www.kth.se/profile/amansha><https://www.kth.se/p
> rofile/amansha>
> <https://www.kth.se/profile/amansha>https://algomaster99.github.io/
> ________________________________
> From: yasser lazrek <lazrekyasser1998 at gmail.com>
> Sent: Monday, September 1, 2025 1:17:30 PM
> To: General discussions about reproducible builds
> Cc: Aman Sharma
> Subject: Re: Reproducing a Maven Central Release from a single GAV
> coordinate
>
> Hello Aman and William,
>
> Thank you for the response. I think I didn’t explain my goal clearly—let me
> restate it with more context.
> Context and goal
>
> * I’m following a top-down, build-from-source approach for Java
> projects. When building Project_X, I sometimes need to rebuild one of its
> dependencies, say G1:A1:V1, from source.
* From that single GAV, tools
> like AROMA can often find the upstream repo URL and tag/commit for the
> source. * However, a single upstream “release” (reactor build) can
> publish multiple modules/artifacts. Depending on the build environment or
> command, the set of produced GAVs can differ (e.g., one build produces
> G1:Aroot:V1, G1:A1:V1, G1:A2:V1, G1:A3:V1; another build command produce an
> extra G1:A4:V1). * To verify I reproduced the correct release, I want to
> compare: * the set of GAVs produced by my local build, with
> * the set of GAVs that were actually published upstream on Maven
> Central for that same release.
* Only after the “set equality” check
> -the number of GAVs produced on my local are the same as the number of GAVs
> on upstream Maven Central- and then proceed to byte-for-byte checks (POMs,
> jars, classes, etc.).
>
>
> The question: Given only a single GAV (e.g., G1:A1:V1), is there a reliable
> way (tool/technique) to determine the complete set of GAVs that were
> published as part of the same upstream release? In other words, starting
> with G1:A1:V1, how do I discover “all other GAVs that were released
> together with it” so I can compare them with the outputs of my local
> build?
> What I’ve considered
>
> * Query Maven Central for all artifacts with the same groupId and
> version (e.g., g:"G1" AND v:"V1") to list potential siblings of G1:A1:V1.
> This works when a multi-module release uses a common groupId and version,
> but not always (some projects split across groupIds or use different
> version schemes).
* Inspect the POM of G1:A1:V1:
> * Walk up the parent POM chain to locate the reactor root or a POM
> with a section.
* From the root POM, enumerate modules and map them to
> expected GAVs at that version, then verify presence on Central (to exclude
> reactor-only or non-published modules). * Use the tag (url/tag) to
> correlate modules in the same repo/tag and confirm which ones were actually
> published. * Use APIs/indexes (e.g.,
> search.maven.org<http://search.maven.org>) to enumerate artifacts and
> classifiers for a group/version, then reconcile with modules found via
> POM/SCM. * Reference projects like jvm-repo-rebuild/reproducible-central
> for patterns, but I’m specifically looking for a way to derive the “release
> set” starting from one known GAV.
> What I’m asking for
>
> * Are there existing tools or established techniques that, given a
> single GAV, can reliably enumerate the full set of GAVs that were published
> with it in the same upstream release?
* If not, are the heuristics above
> (groupId+version query, parent-POM/module traversal, SCM tag correlation,
> and Central presence checks) the recommended approach? * Any pointers to
> tooling, scripts, or best practices you’d suggest for this “release set
> discovery” step would be very helpful.
> Thank you for your guidance!
>
> Best regards, Yasser Lazrek
>
>
>
> Le ven. 29 août 2025 à 19:05, Aman Sharma via rb-general
> <rb-general at lists.reproducible-builds.org<mailto:rb-general at lists.reproduci
> ble-builds.org>> a écrit :
> Hi Yasser,
>
>
>
> > Given just a GAV coordinate, how can I reliably identify the full list of
> > related GAVs that were included in the upstream release of that single
> > GAV?
>
>
> This sounds to me that you are interested about getting all the dependencies
> of that single GAV in order to build an identical jar. But to reproduce the
> jar, you don't need to explicitly gather all the list of dependencies. You
> identify the source code of the project and build it using a Java build
> tool. The build tool gathers the dependencies for you.
>
> Infrastructure like https://github.com/jvm-repo-rebuild/reproducible-central
> does the same thing. Refer to one of the
> buildspec<https://github.com/jvm-repo-rebuild/reproducible-central/blob/mas
> ter/content/io/trino/trino-446.buildspec> files that it has. It is basically
> a build recipe for reproducing the build.
> Regards,
> Aman Sharma
>
> PhD Student
> KTH Royal Institute of Technology
> School of Electrical Engineering and Computer Science (EECS)
> Department of Theoretical Computer Science (TCS)
> <http://www.kth.se><https://www.kth.se/profile/amansha><https://www.kth.se/p
> rofile/amansha>
> <https://www.kth.se/profile/amansha>https://algomaster99.github.io/
> ________________________________
> From: rb-general
> <rb-general-bounces at lists.reproducible-builds.org<mailto:rb-general-bounces
> @lists.reproducible-builds.org>> on behalf of William Burton via rb-general
> <rb-general at lists.reproducible-builds.org<mailto:rb-general at lists.reproduci
> ble-builds.org>>
Sent: Friday, August 29, 2025 12:45:39 PM
> To: General discussions about reproducible builds
> Cc: William Burton
> Subject: Re: Reproducing a Maven Central Release from a single GAV
> coordinate
> Hi Yasser,
>
> This is the focused goal of
> https://github.com/jvm-repo-rebuild/reproducible-central so that's
> definitely a good place to start!
> Additionally, our project (website: https://oss-rebuild.dev/ source:
> https://github.com/google/oss-rebuild) is in the process of adding Maven
> support which will probably leverage reproducible-central in some ways.
> That's in addition to our other supported ecosystems like npm, crates, and
> pypi.
> Comparing the two, I'd say reproducible-central is a good place to dig in on
> technical details about how/why certain GAVs are reproducible or not, while
> OSS Rebuild is a little more "batteries included" by producing signed
> attestations and ecosystem-agnostic support tooling. There's collaboration
> across the two projects so I don't think you can go wrong either way :)
>
> On Fri, Aug 29, 2025 at 11:50 AM yasser lazrek
> <lazrekyasser1998 at gmail.com<mailto:lazrekyasser1998 at gmail.com>> wrote:
> Hello,
>
> As part of a build-from-source initiative, I am working on a top-down
> strategy to build project dependencies from source. Often, when trying to
> build a particular dependency, the only information available is its Maven
> GAV (Group ID, Artifact ID, and Version) coordinate.
> My question is: Given just a GAV coordinate, how can I reliably identify the
> full list of related GAVs that were included in the upstream release of
> that single GAV? The goal is to reproduce the released binary artifact by
> building from the upstream source (using its repository URL and a specific
> commit hash or release tag), and to ensure that the output matches exactly
> what was published on Maven Central.
> Are there recommended tools or best practices to trace the complete set of
> artifacts and metadata associated with an original Maven Central release
> that can cover the majority of artifacts(GAVs) on Maven Central, solely
> from its GAV? Any advice or pointers would be greatly appreciated.
> Thank you for your insights!
>
> Best regards,
More information about the rb-general
mailing list