[rb-general] Reproducing tarballs under various toolchains

Daniel Shahaf danielsh at apache.org
Wed Sep 19 01:40:37 CEST 2018


I've been looking into how I might create reproducible *.tar.(gz|bz2|xz)
artifacts.

https://reproducible-builds.org/docs/archives/ recommends:
.
    # requires GNU Tar 1.28+
    $ tar --sort=name \
          --mtime="@${SOURCE_DATE_EPOCH}" \
          --owner=0 --group=0 --numeric-owner \
          -cf product.tar build
.
which is sort of an answer: even on BSD one can install GNU tar as gtar,
and some PATH manipulation can arrange for it to be called.  gtar is
made the gold standard; one reproduces a tarball by reproducing gtar's
behaviour bug for bug, or more likely, by running gtar.

That's nice, but it also makes gtar a single point of failure.  If
there's a bug in gtar then everyone who uses gtar to achieve
reproducible tarballs would be affected.  This is particularly jarring
because BSD tar _does_ have equivalent functionality, just under a
different name (s/--owner/--uid/).

So I suppose what I'm saying is:

One, it would be nice to be able to reproduce a tarball without having
to use exactly the same toolchain.  (If I had to market this I would
say, "There's more to reproducibility than being deterministic.")

Two, GNU tar and BSD tar have an instance of xkcd.com/927/ in the names
of their option flags.  It's hard to patch upstream tarball rolling
scripts to be reproducible when that would make them unportable.

Cheers,

Daniel

[1] https://man.freebsd.org/tar
[2] https://manpages.debian.org/tar


More information about the rb-general mailing list