git 2.38.0: Change in `git archive` output
Jeff King
peff at peff.net
Mon Oct 17 17:03:45 UTC 2022
On Mon, Oct 17, 2022 at 12:51:25AM +0000, brian m. carlson wrote:
> > but if I instead do "seq 10000", then the files differ. I didn't dig
> > into the actual binary to see the source of the change. It might be
> > something we can tweak (e.g., if it's how a header is represented, or if
> > we can change the zlib parameters to find the same compressions).
>
> I will say that trying to make two compression implementations produce
> identical output is likely futile because it's almost always the case
> that there are multiple identical ways to encode the same data. Most
> implementations are going to prefer improving size over consistency, so
> there's little incentive to copy the same algorithm across
> implementations. I believe even GNU gzip has changed its output in the
> past as better optimizations were implemented.
>
> I mean, don't let me stop you from trying to tweak things to see if you
> can make it work, but in general I think it's likely that some
> divergence is going to occur between implementations no matter what.
Yeah, I definitely don't think it's something we ought to be promising,
or do put a lot of work into. But if there's low-hanging fruit to reduce
immediate pain in practice, it seems worth considering.
-Peff
More information about the rb-general
mailing list