[rb-general] buildinfo content for JVM based build

Arnout Engelen arnout at bzzt.net
Sat Dec 22 11:58:37 CET 2018

On Sat, Dec 22, 2018 at 7:23 AM Hervé Boutemy <hboutemy at apache.org> wrote:
> After Arnout's excellent PoC [1], I'd like to discuss the buildinfo content based on reviewing current example:
> > name=stamina-core
> > group_id=com.scalapenos
> > artifact_id=stamina-core_2.12
> > version=0.1.5-SNAPSHOT
> ok, same meaning than usual pom IIUC

Yes, I think following the maven/ivy conventions here makes sense.

> > build_architecture=all
> why "all" value?
> to me, buildinfo is more a record of the conditions when the build was done than a definition of supported conditions

Agreed. If anything we could record the `-target` level, but since
AFAIK it is rare to publish the same artifact with different `-target`
levels, let's drop this field. I do think we should include the
'classifier' field, if any, though.

> For example, it could be useful to record if the build was done on Windows or any Unix system, for newlines

That might be interesting. I wonder if we should just include most of
the commonly defined JVM system properties

> We'll have to define also if recording info like platform encoding or locale is useful: relying on platform encoding is a bad practice, then I'd prefer avoiding
> but locale may be something useful to record for generated documentation (or assume it should be reproducible as english...)

I agree it would be useful to include those: they shouldn't affect the
build, but including them in the buildinfo may make it easier to spot
when they accidentally do (combined with comparing diffoscope output
of course).

> > source=com.scalapenos:stamina-core_2.12
> > binary=com.scalapenos:stamina-core_2.12
> > package=com.scalapenos:stamina-core_2.12
> I don't get the meaning of source vs binary vs package

I agree - these mostly came from when I was still much closer to the
Debian format, but I don't think they make much sense in the JVM
context. Perhaps let's just remove all 3 until we find a need for

> And as told on my previous email, if source value is a groupId:artifactId coordinate, I expect to find a ${artifactId}-${version}-source-release.zip file in the repository

Perhaps we should promote including 4 'scm' fields corresponding to

> > java.version=1.8.0_191
> ok, this is the exact version of JVM used to run the build tool, and we expect that any 1.8.0 JDK will permit same rebuild

I'm not sure that is safe to expect, but in any case we should record it ;).

> > sbt.version=1.0.4
> > scala.version=2.12.4
> > scala.binary-version=2.12
> I suppose these are SBT build tool specific fields: I don't know Scala nor SBT, then I am not really able to judge if these 3 fields are strictly necessary
> But from an outsider perspective, this seems many fields: do you install scala independently from SBT?

Yes, possibly.

> And is binary-version also installed/defined separately?

No, the binary-version of scala 2.12.4 is always 2.12, so strictly
speaking the binary-version field is redundant.

> Perhaps we should add a basic record: build_tool=sbt
> to avoid guessing build tool from visible properties :)

Agreed. Let's make it 'build-tool=sbt' since hyphens seem to be a bit
more common than underscores in properties files?

> then have a defined list of known build tools and for each tool the list of build-tool-specific variable to define

I'm not sure we need to standardize this strictly, but sure.

> > date=1545329140000
> date of what?

This was the build date. I wasn't sure if 'seconds since epoch' is a
great format, on the other hand SOURCE_DATE_EPOCH also uses it and
it's easy to parse.

> should we record source-date-epoch [2]?

I think including the date of the latest commit is useful - not sure
if it should necessarily come from a SOURCE_DATE_EPOCH environment
variable or could be determined otherwise.

> IIUC, that's the checksum of main build result: POM and binary jar


> I don't know if -javadoc.jar and -sources.jar checksums should also be provided, since they are also build result, but less critical

Good question. For now I'm not taking those into account (and am not
making any effort towards making those reproducible as well).

> Perhaps it's something that can be chosen by the release manager: not every published bit has to be reproducible, I suppose


> During Reproducible Builds even in Paris [3], there was an idea of recording checksum for effective dependencies used.
> I'm not personally convinced this is useful, since it's part of reproducible process.

I'm not convinced yet either, but I haven't heard the motivation.

Kind regards,


More information about the rb-general mailing list