unreproducible zlib/deflate compression in ZIP/APK files
Fay Stegerman
flx at obfusk.net
Fri Sep 6 15:20:20 UTC 2024
Hi!
I started a thread [1] on the fediverse (mastodon) about this, which got some
interesting replies, but no clear answers yet. Here's a summary. Further input
is of course very much appreciated :)
My original post:
> I have a weird issue: an APK [2] (i.e. ZIP file) with deflate-compressed data
> which I cannot recreate using python + zlib, even with code [3] that tries
> every possible combination of parameters (with the obvious exception of
> providing a predefined compression dictionary) instead of just varying the
> compression level.
> One file's compressed data can be reproduced using strategy Z_FILTERED, but
> for all the other files I've tried the script doesn't return any matches (it
> works fine with the APK we built ourselves, which should have had identical
> compressed data as the build is supposed to be deterministic).
> Anyone have any idea what's going on here?
A follow-up later:
> [...] 99.9% of the time the data matches exactly what python + zlib would
> produce. But in this case the upstream APK doesn't match the APK from our
> rebuilder, but I can perfectly recreate the data from the APK from our
> rebuilder with Python + zlib. But not the upstream APK.
> The main issue isn't that I can't recreate it from Python + zlib (though that
> would break my tooling meant to remove nondeterminism from APKs). The problem
> is the Android toolchain producing different results on different systems,
> which is not supposed to happen.
> So I'd like to know what the cause is to I can report it to Google and
> hopefully get it fixed. Or be able to create a workaround at least, like
> making sure all builds are using the same deflate implementation if that's the
> issue.
And another one:
> But I have seen a similar issue -- though limited to .ttf files -- twice
> before. And I don't recall there was anything known to be odd about those
> build systems. We never figure[d] out what the cause was, the workaround was
> telling the toolchain to not compress the file.
> I have also seen it with Tor Browser for Android, but they repack the APK with
> 7-Zip, which has its own deflate implementation.
And finally:
> Unfortunately, we've only ever seen the issue on developers' machines, never
> in our rebuilders. So we haven't able to reproduce it. And since we had a
> workaround for the TTF files, we never investigated those environments closely
> enough to catch something like a different system zlib.
> I do know at least one of the APKs with a weird TTF [4] was confirmed to have
> been buil[t] with Android Studio using OpenJDK 11. Which should indeed be
> using the system zlib.
> But [...] having only a single file differ -- a TTF file in both cases -- is
> very mysterious.
The other APK [5] was built on MacOS Sonoma it turns out.
As @dougall at mastodon.social mentioned [6]:
> There are a lot of forks of zlib (zlib-ng, cloudflare), and separate
> implementations (zopfli, miniz, libdeflate). Forks of zlib typically change
> the data structure used to find LZ matches, which changes the compressed data.
> This may also change across versions.
> They could also be using zlib differently, e.g. passing
> Z_SYNC_FLUSH/Z_FULL_FLUSH to the API during compression. But idk if zlib makes
> any guarantees at all, so I couldn't rule out anything.
Then echoed my suspicions as to the potential cause [7]:
> My guess would be that the android toolchain uses system zlib, and the system
> zlib provider is different – maybe there are changes in Apple's zlib fork, or
> maybe someone installed zlib-ng as the system zlib provider. (I'm guessing
> those are the two most widely deployed forks.)
And pointed out a concrete example I was not aware of yet [8]:
> Huh, yeah, I guess Fedora has been [shipping a drop-in replacement for zlib]
> for a few months now? [9, 10]
We don't know if this is related to the APK issues, but if that does indeed
produce different output it might also break reproducibility of other things
(though it seems gzip doesn't link to zlib and thus would be unaffected).
An interesting addition by @snowfox at tech.lgbt [11] mentions how Google patches
uncompressed data, which only works when being able to perfectly recreate the
original compressed data (or the signature would become invalid):
> It also makes patches bigger, since apparently the efficient patching
> algorithm requires the device to be able to reconstruct and compress the data
> and get byte-identical output [12]
And a post by @retr0id at retr0.id [13] pointed out that:
> [...] I don't think zlib itself ever makes promises about being deterministic
> between release versions, or at all. I remember seeing one of the alternative
> libraries having an explicit "deterministic" mode (but can't remember which).
> Anyone doing reproducible builds should ideally be using a zlib impl that
> tries to guarantee determinism.
To which I replied that:
> Unfortunately, we can't control other people's build environments and e.g.
> OpenJDK and Python will use the system zlib.
> And even if we could switch to a deterministic system zlib in our rebuilder it
> would be pointless if upstream uses a deflate implementation that generates
> different output.
- Fay
[1] https://tech.lgbt/@obfusk/113081697577399562
[2] https://github.com/enteraname74/SoulSearching/issues/31#issuecomment-2329932093
[3] https://gist.github.com/obfusk/d9d1223cd1fa1a875e6cf49827621e69
[4] https://github.com/alialbaali/Noto/releases/download/v2.2.3/Noto.apk
[5] https://github.com/danilkinkin/buckwheat/releases/download/4.5.2/buckwheat_v4.5.2.apk
[6] https://mastodon.social/@dougall/113085178299289419
[7] https://mastodon.social/@dougall/113087304443972892
[8] https://mastodon.social/@dougall/113087372196958352
[9] https://fedoraproject.org/wiki/Changes/ZlibNGTransition
[10] https://bugzilla.redhat.com/show_bug.cgi?id=2252767
[11] https://tech.lgbt/@snowfox/113085457596477594
[12] https://android-developers.googleblog.com/2016/12/saving-data-reducing-the-size-of-app-updates-by-65-percent.html
[13] https://retr0.id/objects/51bd6113-4fd1-4caa-8b74-44f38fec3f2e
More information about the rb-general
mailing list