[rb-general] Reproducing tarballs under various toolchains

Eric Myhre hash at exultant.us
Wed Sep 19 19:11:27 CEST 2018


Whoops, got offlisted by dubious email client UX... re-listing...

On 19. sep. 2018 16:18, Daniel Shahaf wrote:
> Eric Myhre wrote on Wed, 19 Sep 2018 08:52 +0200:
>> This may be slightly tangential, but matching on "tarballs" and things I
>> recently learned while hashing them:
>>
> First things first, why offlist?  I'll fullquote and feel free to re-CC
> the list on reply.
>
>> Did you know github's automatically-produced tarballs of the source tree
>> when you tag a release will contain a first entry which is *not* a file,
>> but is rather type 'g', and this will contain a PAX extended header
>> called "comment", and this will contain the git commit hash?
>>
> No, but last I checked github's tarballs were identical to ones produced
> by git-archive(1).
Today I learned!  Thanks :)

>> I'm not really sure what to make of this fun fact.  I'm sure arguably
>> that feature might be useful to someone, somewhere; but I also suspect
>> people probably never notice it's there.  And it makes reproducing those
>> tarballs just a bit more esoteric, and handling them just a bit harder,
>> for what I would say is really no reason.
>>
> The root problem in this case might be that there's no universal method
> of transmitting a file along with metadata, so people are forced to
> invent format extensions to store the metadata in-band.
>
>>> (If I had to market this I would say, "There's more to reproducibility
>>> than being deterministic.")
>> I think that concept has come up a lot over the years, and I like that
>> formulation.  :thumbs_up:  Occasionally I wonder if we could sum up the
>> gap with another word.  "convergence" floats by, but I don't know if
>> that really auto-explains itself to anyone not already thinking of the
>> concept.
> I see what you mean.  How about "Independently reproducible", borrowing
> from "independent confirmation" in the sciences?
I'm not a huge fan of using the word "independently" because I think
that detracts/distracts from the determinism message.

If some person frobnozes the baz, and then another, it's not the
uniqueness of the entity doing the frobnozing that's supposed to matter;
it's whether they were able to use different tools and get a convergent
result.  Sure, having a different person doing it *tends* to result in
minor methodological differences in the other sciences; but ideally, as
programmers, we're scripting and automating enough that we shouldn't
have enough *accidental* differences to make an interesting case,
right?  And instead, we can move to more significant structural (and
reproducibly testable) differences in build toolchain to make things
interestingly different.

But maybe my brain's lexicon is strange for first associating
"independent" with an attribute of people rather than variables.

"Diversely reproducible"?  (I think that may have been floated as a term
in one of the prior summits, but I can't remember clearly, so if that's
someone else's idea, please forgive the lack of attribution...)

And I'll quit bikesheding words now, cheers :)
>
> Cheers,
>
> Daniel



More information about the rb-general mailing list