[rb-general] Reproducing tarballs under various toolchains

Orians, Jeremiah (DTMB) OriansJ at michigan.gov
Wed Sep 19 12:33:34 CEST 2018

> gtar is made the gold standard; one reproduces a tarball by reproducing gtar's behaviour bug for bug, or more likely, by running gtar.
I think the idea one has to use any particular program as opposed to any particular standard quite a wrong path.

> That's nice, but it also makes gtar a single point of failure.  If there's a bug in gtar then everyone who uses gtar to achieve reproducible tarballs would be affected.
Or what happens when its behavior generates different results on different hardware, operating systems, etc.

> This is particularly jarring because BSD tar _does_ have equivalent functionality, just under a different name (s/--owner/--uid/).
The problem is getting them to harmonize together on a shared standard for flags and behavior.

> One, it would be nice to be able to reproduce a tarball without having to use exactly the same toolchain. 
It's called Standards based Determinism.
Every program that implements the standard will produce identical output, regardless of how it was implemented or the particulars of design or hosting system.
It is a property required by mescc-tools.
It is entirely possible in compilers too but tends to require greater control over optimizations and details of register behavior.
From a theoretical perspective, should the C state machine be better defined; there is no reason that MesCC, clang and gcc shouldn't produce identical output given identical input.

I'd even go so far as to claim C's behavioral changes across platforms is more of a symptom of refusing to express types and behavior exactly.

> It's hard to patch upstream tarball rolling scripts to be reproducible when that would make them unportable.
One doesn't have to; one could simply offer a patch to either the Gnu or BSD Tar developers adding support for the alternate flag form.

More information about the rb-general mailing list