[rb-general] Reproducing tarballs under various toolchains

Daniel Shahaf danielsh at apache.org
Wed Sep 19 16:11:59 CEST 2018


Holger Levsen wrote on Wed, 19 Sep 2018 10:57 +0000:
> On Tue, Sep 18, 2018 at 11:40:37PM +0000, Daniel Shahaf wrote:
> > So I suppose what I'm saying is:
> > 
> > One, it would be nice to be able to reproduce a tarball without having
> > to use exactly the same toolchain.  (If I had to market this I would
> > say, "There's more to reproducibility than being deterministic.")
> 
> what would be your proposed solution?

I think Jeremiah hit the nail on the head: there should be
a *specification* of output which itself, rather than any particular
implementation, would be the yardstick.

For archive format it should be easy to define a standard format.

For compilers, I'll add that compiling with the highest possible
optimization level would never be reproducible (there will inevitably be
a flag for "make it fast even if it's not reproducible"); being
reproducible implies using no transformation or architecture feature
that other implementations can't (learn to) replicate.  (Corollary: a
patented implementation can't be reproducible, even if it's
deterministic.)

> > Two, GNU tar and BSD tar have an instance of xkcd.com/927/ in the names
> > of their option flags.  It's hard to patch upstream tarball rolling
> > scripts to be reproducible when that would make them unportable.
> 
> what would be your proposed solution?

First let me point out that this problem isn't unique to tar.  It also
affects implementations of sh, ls, and virtually any other tool in the
POSIX set.  Why?  Because when Foo OS adds an 'ls -x' flag, neither
other OS's nor POSIX can add that flag without creating
incompatibilities.

It's a problem of namespacing; it's not specific to reproducibility but
generic to portability.

> are there other tar implementations than those two?
> 

Probably.  Quoting the FreeBSD man page:

HISTORY
     A tar command appeared in Seventh Edition Unix, which was released in
     January, 1979. There have been numerous other implementations, many of
     which extended the file format. John Gilmore's pdtar public-domain
     implementation (circa November, 1987) was quite influential, and formed
     the basis of GNU tar. GNU tar was included as the standard system tar in
     FreeBSD beginning with FreeBSD 1.0.

     This is a complete re-implementation based on the libarchive(3) library.
     It was first released with FreeBSD 5.4 in May, 2005.

Cheers,

Daniel


More information about the rb-general mailing list