[rb-general] Reproducing tarballs under various toolchains

Eli Schwartz eschwartz at archlinux.org
Thu Sep 20 09:08:20 CEST 2018

On 9/18/18 7:40 PM, Daniel Shahaf wrote:
> I've been looking into how I might create reproducible *.tar.(gz|bz2|xz)
> artifacts.
> https://reproducible-builds.org/docs/archives/ recommends:
> .
>     # requires GNU Tar 1.28+
>     $ tar --sort=name \
>           --mtime="@${SOURCE_DATE_EPOCH}" \
>           --owner=0 --group=0 --numeric-owner \
>           -cf product.tar build
> .
> which is sort of an answer: even on BSD one can install GNU tar as gtar,
> and some PATH manipulation can arrange for it to be called.  gtar is
> made the gold standard; one reproduces a tarball by reproducing gtar's
> behaviour bug for bug, or more likely, by running gtar.
> That's nice, but it also makes gtar a single point of failure.  If
> there's a bug in gtar then everyone who uses gtar to achieve
> reproducible tarballs would be affected.  This is particularly jarring
> because BSD tar _does_ have equivalent functionality, just under a
> different name (s/--owner/--uid/).
> So I suppose what I'm saying is:
> One, it would be nice to be able to reproduce a tarball without having
> to use exactly the same toolchain.  (If I had to market this I would
> say, "There's more to reproducibility than being deterministic.")
> Two, GNU tar and BSD tar have an instance of xkcd.com/927/ in the names
> of their option flags.  It's hard to patch upstream tarball rolling
> scripts to be reproducible when that would make them unportable.
This is really just a generic question of "are two programs that create
the same *type* of output, generating custom, unpredictable output".

And it doesn't seem like that would be so just because of the fact that
they use different command-line flags to override the value stored in a
standardized field (the uid/git of recorded files).

Obviously the best solution is to ensure via upstream feature requests,
that both GNU tar and libarchive tar (and for bonus points the other
less popular but still existing tar implementations) should support each
other's flags for compatibility.

But the secondary solution is to add a very small documentation update
to the reproducible-builds website, to change "the recommended way is to
use GNU tar with these switches" to "the recommended way is to either
use GNU tar with these switches, or use bsdtar with these other switches".

I'd like to point out, the current docs explicitly state "Tar will be
used as the main example but these tips apply to other archive formats
as well." The implication is that it's surely possible to achieve the
same results with other tar implementations, but it might be necessary
to figure out the right switches yourself.


On which note, bsdtar doesn't support --mtime, and the docs already
recommend the alternative:
find . -exec touch -h -d @$SOURCE_DATE_EPOCH {} +

bsdtar also (obviously) doesn't support --clamp-mtime, but that flag is
so misguided...

bsdtar also doesn't support --sort, but the docs already recommend the
alternative of using --null --files-from -

BTW: Arch Linux packaging uses both workarounds for bsdtar, especially
the --files-from since gtar --sort=name is anyways unsuitable as we want
to pin our own metadata files as the first files to archive.

Eli Schwartz
Arch Linux Bug Wrangler and Trusted User

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.reproducible-builds.org/pipermail/rb-general/attachments/20180920/ad1387ad/attachment.sig>

More information about the rb-general mailing list