git 2.38.0: Change in `git archive` output

Jeff King peff at peff.net
Mon Oct 17 00:02:19 UTC 2022


On Sun, Oct 16, 2022 at 11:57:40PM +0200, kpcyrd wrote:

> multiple people in Arch Linux noticed the output of our `git archive`
> command doesn't match the tarball served by github anymore.
> 
> First I suspected an update in our gzip package until I found this line in
> the git 2.38.0 release notes:
> 
> > * Teach "git archive" to (optionally and then by default) avoid
> >   spawning an external "gzip" process when creating ".tar.gz" (and
> >   ".tgz") archives.
> 
> I've then found this commit that could be considered a breaking change in
> `git archive`:
> 
> https://github.com/git/git/commit/4f4be00d302bc52d0d9d5a3d4738bb525066c710
> 
> I don't know if there's some kind of gzip standard that could be used to
> align the git internal gzip implementation with gnu gzip.

Interesting. For a small input, they seem to produce the same file for
me:

  git init repo
  cd repo
  seq 1000 >file
  git add file
  git commit -m foo
 
  git -c tar.tar.gz.command='git archive gzip' \
    archive --format=tar.gz HEAD >internal.tar.gz
  git -c tar.tar.gz.command='gzip -cn' \
    archive --format=tar.gz HEAD >external.tar.gz
  cmp internal.tar.gz external.tar.gz && echo ok

but if I instead do "seq 10000", then the files differ. I didn't dig
into the actual binary to see the source of the change. It might be
something we can tweak (e.g., if it's how a header is represented, or if
we can change the zlib parameters to find the same compressions).

> I'm not saying this is necessarily a bug or regression but it makes it
> harder to reproduce github tar balls from a git repository. Just sharing
> what I've debugged. :)

I don't think we make promises about stable output from "git archive".
We've fixed bugs in the tar-generating side before that lead to changes.
But if we can easily make them the same, that might be worth doing.

In the meantime, you can use the config option I showed above to get the
old, external behavior. At some point GitHub will probably update their
version, though, at which point you'd want the internal (they may also
try to retain the old one, though; lots of distro/packaging projects get
broken when GitHub's archives aren't byte-for-byte identical).

-Peff


More information about the rb-general mailing list