Reproducing a Maven Central Release from a single GAV coordinate
Aman Sharma
amansha at kth.se
Mon Sep 1 17:53:28 UTC 2025
Hi Yasser,
> Given only a single GAV (e.g., G1:A1:V1), is there a reliable way (tool/technique) to determine the complete set of GAVs that were published as part of the same upstream release? In other words, starting with G1:A1:V1, how do I discover “all other GAVs that were released together with it” so I can compare them with the outputs of my local build?
I do understand your question better now. Thanks for elaboration :)
I had the same question when I was analyzing data in the reproducible central dataset. I wanted to present the reproducibility per GAV and not per source project. Thus, I needed to split the rebuilt artifacts from a source project into multiple GAVs.
> Inspect the POM of G1:A1:V1:
* Walk up the parent POM chain to locate the reactor root or a POM with a section.
* From the root POM, enumerate modules and map them to expected GAVs at that version, then verify presence on Central (to exclude reactor-only or non-published modules).
I built a tool maven-module-graph<https://github.com/chains-project/maven-module-graph> that does this. It takes in the root pom (along with the entire source of Maven project) and returns all the submodules in the project. I used this tool first get a list of GAVs and then based on the artifact ID, I mapped the unreproducible artifact which looks something like this<https://github.com/chains-project/reproducible-central/blob/master/java/unreproducible_gradle_projects_to_releases.json>. Here are the commands to try it out.
./gradlew build
java -jar build/libs/maven-module-graph-1.0-SNAPSHOT.jar \
--project-root <path/to/maven/project/root> \
--json <path/to/output.json> \
--plain-text <path/to/output.txt>
I have motivated in the README why I built this tool, but basically starting up the maven reactor to discover all the submodules was too slow for me (even though it would be more correct in edge cases such as pom file named differently than pom.xml).
> then verify presence on Central (to exclude reactor-only or non-published modules).
I include all the GAVs in all profiles by default. If you want to exclude, add `--exclude-profiles`. However, verification of presence is indeed a good heuristic.
Regards,
Aman Sharma
PhD Student
KTH Royal Institute of Technology
School of Electrical Engineering and Computer Science (EECS)
Department of Theoretical Computer Science (TCS)
<http://www.kth.se><https://www.kth.se/profile/amansha><https://www.kth.se/profile/amansha>
<https://www.kth.se/profile/amansha>https://algomaster99.github.io/
________________________________
From: yasser lazrek <lazrekyasser1998 at gmail.com>
Sent: Monday, September 1, 2025 1:17:30 PM
To: General discussions about reproducible builds
Cc: Aman Sharma
Subject: Re: Reproducing a Maven Central Release from a single GAV coordinate
Hello Aman and William,
Thank you for the response. I think I didn’t explain my goal clearly—let me restate it with more context.
Context and goal
* I’m following a top-down, build-from-source approach for Java projects. When building Project_X, I sometimes need to rebuild one of its dependencies, say G1:A1:V1, from source.
* From that single GAV, tools like AROMA can often find the upstream repo URL and tag/commit for the source.
* However, a single upstream “release” (reactor build) can publish multiple modules/artifacts. Depending on the build environment or command, the set of produced GAVs can differ (e.g., one build produces G1:Aroot:V1, G1:A1:V1, G1:A2:V1, G1:A3:V1; another build command produce an extra G1:A4:V1).
* To verify I reproduced the correct release, I want to compare:
* the set of GAVs produced by my local build, with
* the set of GAVs that were actually published upstream on Maven Central for that same release.
* Only after the “set equality” check -the number of GAVs produced on my local are the same as the number of GAVs on upstream Maven Central- and then proceed to byte-for-byte checks (POMs, jars, classes, etc.).
The question: Given only a single GAV (e.g., G1:A1:V1), is there a reliable way (tool/technique) to determine the complete set of GAVs that were published as part of the same upstream release? In other words, starting with G1:A1:V1, how do I discover “all other GAVs that were released together with it” so I can compare them with the outputs of my local build?
What I’ve considered
* Query Maven Central for all artifacts with the same groupId and version (e.g., g:"G1" AND v:"V1") to list potential siblings of G1:A1:V1. This works when a multi-module release uses a common groupId and version, but not always (some projects split across groupIds or use different version schemes).
* Inspect the POM of G1:A1:V1:
* Walk up the parent POM chain to locate the reactor root or a POM with a section.
* From the root POM, enumerate modules and map them to expected GAVs at that version, then verify presence on Central (to exclude reactor-only or non-published modules).
* Use the tag (url/tag) to correlate modules in the same repo/tag and confirm which ones were actually published.
* Use APIs/indexes (e.g., search.maven.org<http://search.maven.org>) to enumerate artifacts and classifiers for a group/version, then reconcile with modules found via POM/SCM.
* Reference projects like jvm-repo-rebuild/reproducible-central for patterns, but I’m specifically looking for a way to derive the “release set” starting from one known GAV.
What I’m asking for
* Are there existing tools or established techniques that, given a single GAV, can reliably enumerate the full set of GAVs that were published with it in the same upstream release?
* If not, are the heuristics above (groupId+version query, parent-POM/module traversal, SCM tag correlation, and Central presence checks) the recommended approach?
* Any pointers to tooling, scripts, or best practices you’d suggest for this “release set discovery” step would be very helpful.
Thank you for your guidance!
Best regards, Yasser Lazrek
Le ven. 29 août 2025 à 19:05, Aman Sharma via rb-general <rb-general at lists.reproducible-builds.org<mailto:rb-general at lists.reproducible-builds.org>> a écrit :
Hi Yasser,
> Given just a GAV coordinate, how can I reliably identify the full list of related GAVs that were included in the upstream release of that single GAV?
This sounds to me that you are interested about getting all the dependencies of that single GAV in order to build an identical jar. But to reproduce the jar, you don't need to explicitly gather all the list of dependencies. You identify the source code of the project and build it using a Java build tool. The build tool gathers the dependencies for you.
Infrastructure like https://github.com/jvm-repo-rebuild/reproducible-central does the same thing. Refer to one of the buildspec<https://github.com/jvm-repo-rebuild/reproducible-central/blob/master/content/io/trino/trino-446.buildspec> files that it has. It is basically a build recipe for reproducing the build.
Regards,
Aman Sharma
PhD Student
KTH Royal Institute of Technology
School of Electrical Engineering and Computer Science (EECS)
Department of Theoretical Computer Science (TCS)
<http://www.kth.se><https://www.kth.se/profile/amansha><https://www.kth.se/profile/amansha>
<https://www.kth.se/profile/amansha>https://algomaster99.github.io/
________________________________
From: rb-general <rb-general-bounces at lists.reproducible-builds.org<mailto:rb-general-bounces at lists.reproducible-builds.org>> on behalf of William Burton via rb-general <rb-general at lists.reproducible-builds.org<mailto:rb-general at lists.reproducible-builds.org>>
Sent: Friday, August 29, 2025 12:45:39 PM
To: General discussions about reproducible builds
Cc: William Burton
Subject: Re: Reproducing a Maven Central Release from a single GAV coordinate
Hi Yasser,
This is the focused goal of https://github.com/jvm-repo-rebuild/reproducible-central so that's definitely a good place to start!
Additionally, our project (website: https://oss-rebuild.dev/ source: https://github.com/google/oss-rebuild) is in the process of adding Maven support which will probably leverage reproducible-central in some ways. That's in addition to our other supported ecosystems like npm, crates, and pypi.
Comparing the two, I'd say reproducible-central is a good place to dig in on technical details about how/why certain GAVs are reproducible or not, while OSS Rebuild is a little more "batteries included" by producing signed attestations and ecosystem-agnostic support tooling. There's collaboration across the two projects so I don't think you can go wrong either way :)
On Fri, Aug 29, 2025 at 11:50 AM yasser lazrek <lazrekyasser1998 at gmail.com<mailto:lazrekyasser1998 at gmail.com>> wrote:
Hello,
As part of a build-from-source initiative, I am working on a top-down strategy to build project dependencies from source. Often, when trying to build a particular dependency, the only information available is its Maven GAV (Group ID, Artifact ID, and Version) coordinate.
My question is: Given just a GAV coordinate, how can I reliably identify the full list of related GAVs that were included in the upstream release of that single GAV? The goal is to reproduce the released binary artifact by building from the upstream source (using its repository URL and a specific commit hash or release tag), and to ensure that the output matches exactly what was published on Maven Central.
Are there recommended tools or best practices to trace the complete set of artifacts and metadata associated with an original Maven Central release that can cover the majority of artifacts(GAVs) on Maven Central, solely from its GAV? Any advice or pointers would be greatly appreciated.
Thank you for your insights!
Best regards,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.reproducible-builds.org/pipermail/rb-general/attachments/20250901/dc2f019f/attachment.htm>
More information about the rb-general
mailing list