Reproducing a Maven Central Release from a single GAV coordinate
yasser lazrek
lazrekyasser1998 at gmail.com
Mon Sep 1 17:17:30 UTC 2025
Hello Aman and William,
Thank you for the response. I think I didn’t explain my goal clearly—let me
restate it with more context.
Context and goal
- I’m following a top-down, build-from-source approach for Java
projects. When building Project_X, I sometimes need to rebuild one of its
dependencies, say G1:A1:V1, from source.
- From that single GAV, tools like AROMA can often find the upstream
repo URL and tag/commit for the source.
- However, a single upstream “release” (reactor build) can publish
multiple modules/artifacts. Depending on the build environment or command,
the set of produced GAVs can differ (e.g., one build produces G1:Aroot:V1,
G1:A1:V1, G1:A2:V1, G1:A3:V1; another build command produce an extra
G1:A4:V1).
- To verify I reproduced the correct release, I want to compare:
1. the set of GAVs produced by my local build, with
2. the set of GAVs that were actually published upstream on Maven
Central for that same release.
- Only after the “set equality” check -the number of GAVs produced on my
local are the same as the number of GAVs on upstream Maven Central- and
then proceed to byte-for-byte checks (POMs, jars, classes, etc.).
The question: Given only a single GAV (e.g., G1:A1:V1), is there a reliable
way (tool/technique) to determine the complete set of GAVs that were
published as part of the same upstream release? In other words, starting
with G1:A1:V1, how do I discover “all other GAVs that were released
together with it” so I can compare them with the outputs of my local build?
What I’ve considered
- Query Maven Central for all artifacts with the same groupId and
version (e.g., g:"G1" AND v:"V1") to list potential siblings of G1:A1:V1.
This works when a multi-module release uses a common groupId and version,
but not always (some projects split across groupIds or use different
version schemes).
- Inspect the POM of G1:A1:V1:
- Walk up the parent POM chain to locate the reactor root or a POM
with a section.
- From the root POM, enumerate modules and map them to expected GAVs
at that version, then verify presence on Central (to exclude reactor-only
or non-published modules).
- Use the tag (url/tag) to correlate modules in the same repo/tag and
confirm which ones were actually published.
- Use APIs/indexes (e.g., search.maven.org) to enumerate artifacts and
classifiers for a group/version, then reconcile with modules found via
POM/SCM.
- Reference projects like jvm-repo-rebuild/reproducible-central for
patterns, but I’m specifically looking for a way to derive the “release
set” starting from one known GAV.
What I’m asking for
- Are there existing tools or established techniques that, given a
single GAV, can reliably enumerate the full set of GAVs that were published
with it in the same upstream release?
- If not, are the heuristics above (groupId+version query,
parent-POM/module traversal, SCM tag correlation, and Central presence
checks) the recommended approach?
- Any pointers to tooling, scripts, or best practices you’d suggest for
this “release set discovery” step would be very helpful.
Thank you for your guidance!
Best regards, Yasser Lazrek
Le ven. 29 août 2025 à 19:05, Aman Sharma via rb-general <
rb-general at lists.reproducible-builds.org> a écrit :
> Hi Yasser,
>
>
> > *Given just a GAV coordinate, how can I reliably identify the full list
> of related GAVs that were included in the upstream release of that single
> GAV?*
>
>
> This sounds to me that you are interested about getting all the
> dependencies of that single GAV in order to build an identical jar. But to
> reproduce the jar, you don't need to explicitly gather all the list of
> dependencies. You identify the source code of the project and build it
> using a Java build tool. The build tool gathers the dependencies for you.
>
>
> Infrastructure like
> https://github.com/jvm-repo-rebuild/reproducible-central does the same
> thing. Refer to one of the *buildspec*
> <https://github.com/jvm-repo-rebuild/reproducible-central/blob/master/content/io/trino/trino-446.buildspec>
> files that it has. It is basically a build recipe for reproducing the build.
>
> Regards,
> Aman Sharma
>
> PhD Student
> KTH Royal Institute of Technology
> School of Electrical Engineering and Computer Science (EECS)
> Department of Theoretical Computer Science (TCS)
> <http://www.kth.se> <https://www.kth.se/profile/amansha>
> <https://www.kth.se/profile/amansha>
> <https://www.kth.se/profile/amansha>https://algomaster99.github.io/
> ------------------------------
> *From:* rb-general <rb-general-bounces at lists.reproducible-builds.org> on
> behalf of William Burton via rb-general <
> rb-general at lists.reproducible-builds.org>
> *Sent:* Friday, August 29, 2025 12:45:39 PM
> *To:* General discussions about reproducible builds
> *Cc:* William Burton
> *Subject:* Re: Reproducing a Maven Central Release from a single GAV
> coordinate
>
> Hi Yasser,
>
> This is the focused goal of
> https://github.com/jvm-repo-rebuild/reproducible-central so that's
> definitely a good place to start!
>
> Additionally, our project (website: https://oss-rebuild.dev/ source:
> https://github.com/google/oss-rebuild) is in the process of adding Maven
> support which will probably leverage reproducible-central in some ways.
> That's in addition to our other supported ecosystems like npm, crates, and
> pypi.
>
> Comparing the two, I'd say reproducible-central is a good place to dig in
> on technical details about how/why certain GAVs are reproducible or not,
> while OSS Rebuild is a little more "batteries included" by producing signed
> attestations and ecosystem-agnostic support tooling. There's collaboration
> across the two projects so I don't think you can go wrong either way :)
>
>
> On Fri, Aug 29, 2025 at 11:50 AM yasser lazrek <lazrekyasser1998 at gmail.com>
> wrote:
>
>> Hello,
>>
>> As part of a build-from-source initiative, I am working on a top-down
>> strategy to build project dependencies from source. Often, when trying to
>> build a particular dependency, the only information available is its Maven
>> GAV (Group ID, Artifact ID, and Version) coordinate.
>>
>> My question is: Given just a GAV coordinate, how can I reliably identify
>> the full list of related GAVs that were included in the upstream release of
>> that single GAV? The goal is to reproduce the released binary artifact
>> by building from the upstream source (using its repository URL and a
>> specific commit hash or release tag), and to ensure that the output matches
>> exactly what was published on Maven Central.
>>
>> Are there recommended tools or best practices to trace the complete set
>> of artifacts and metadata associated with an original Maven Central release
>> that can cover the majority of artifacts(GAVs) on Maven Central, solely
>> from its GAV? Any advice or pointers would be greatly appreciated.
>>
>> Thank you for your insights!
>>
>> Best regards,
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.reproducible-builds.org/pipermail/rb-general/attachments/20250901/8e585933/attachment.htm>
More information about the rb-general
mailing list