unreproducible zlib/deflate compression in ZIP/APK files

Fay Stegerman flx at obfusk.net
Sat Sep 28 22:32:07 UTC 2024


Hi!

Following up on [1]:

> I have a weird issue: an APK (i.e. ZIP file) with deflate-compressed data
> which I cannot recreate using python + zlib [...]

After getting confirmation that the APK was built on Fedora 40 -- and now armed
with the knowledge that Fedora 40 uses zlib-ng as system zlib -- I did some
further investigating.

I ran the following python3 script -- which is deterministic as long as zlib
output is identical -- on Debian and Fedora:

  import zipfile
  with zipfile.ZipFile("foo.zip", "w") as zf:
      zi = zipfile.ZipInfo("foo")
      zi.compress_type = 8
      zi._compresslevel = 6
      zf.writestr(zi, "The quick brown fox jumps over the lazy dog")

diff-zip-meta shows the files are identical except for the compressed data:

  --- foo-debian.zip
  +++ foo-fedora.zip
  entry foo:
  - compress_size=44
  + compress_size=45
  - compresslevel=9|6|4|1
  + compresslevel=unknown
  - compress_crc=0x4dd0a967
  + compress_crc=0xd64979c2

xxd diff output:

  @@ -1,10 +1,10 @@
   00000000: 504b 0304 1400 0000 0800 0000 2100 39a3  PK..........!.9.
  -00000010: 4f41 2c00 0000 2b00 0000 0300 0000 666f  OA,...+.......fo
  +00000010: 4f41 2d00 0000 2b00 0000 0300 0000 666f  OA-...+.......fo
   00000020: 6f0b c948 5528 2ccd 4cce 5648 2aca 2fcf  o..HU(,.L.VH*./.
   00000030: 5348 cbaf 50c8 2acd 2d28 56c8 2f4b 2d52  SH..P.*.-(V./K-R
  -00000040: 2801 4ae7 2456 552a a4e4 a703 0050 4b01  (.J.$VU*.....PK.
  -00000050: 0214 0314 0000 0008 0000 0021 0039 a34f  ...........!.9.O
  -00000060: 412c 0000 002b 0000 0003 0000 0000 0000  A,...+..........
  -00000070: 0000 0000 0080 0100 0000 0066 6f6f 504b  ...........fooPK
  -00000080: 0506 0000 0000 0100 0100 3100 0000 4d00  ..........1...M.
  -00000090: 0000 0000                                ....
  +00000040: 28c9 4855 c849 acaa 5448 c94f 0700 504b  (.HU.I..TH.O..PK
  +00000050: 0102 1403 1400 0000 0800 0000 2100 39a3  ............!.9.
  +00000060: 4f41 2d00 0000 2b00 0000 0300 0000 0000  OA-...+.........
  +00000070: 0000 0000 0000 8001 0000 0000 666f 6f50  ............fooP
  +00000080: 4b05 0600 0000 0001 0001 0031 0000 004e  K..........1...N
  +00000090: 0000 0000 00                             .....

And indeed, on Fedora 40 I can now reproduce the compressed data from the APK
with Python, confirming it was indeed built with zlib-ng instead of zlib.  With
one exception: AndroidManifest.xml can be reproduced with zlib but not with
zlib-ng (see below).  And some files can be reproduced with both.

Oddly, whilst Python's zipfile output clearly changed with zlib-ng, using rbtlog
to rebuild an Android app on both Debian bookworm and Fedora 40 didn't show any
differences, so I looked at why that might be.

As it turns out, OpenJDK bundles a copy of zlib.  The Debian package doesn't use
the bundled zlib code but instead links to the system zlib.  But the Fedora
package seems to be using the bundled zlib, explaining why the Fedora OpenJDK
packages are unaffected by the switch to zlib-ng.  This suggests the APK in
question was built using a different JDK that does link to system zlib instead.

As for the AndroidManifest.xml, I suspect that is not generated using the Java
part of the Android toolchain but by aapt2 from android build-tools, which is
native (C++) code and seems to be statically linked against zlib as well (like
java from Fedora's OpenJDK, lld doesn't show it using system zlib, unlike aapt2
from the Debian package).

Whilst solving the mystery is great, it's unfortunate that (re)building APKs has
become a lot more complicated, especially if Debian also switches to zlib-ng
[2], with various parts of the Android toolchain, the JDK, and my own RB tooling
being affected, and not all equally, when an alternative system zlib is used.

Meanwhile the mystery of the two TTF files from other APKs is still unsolved, I
can't recreate their data on Fedora 40 either.  It seems likely they were
somehow added to the APK by a part of the Android toolchain using yet another
zlib/deflate implementation.

- Fay

[1] https://lists.reproducible-builds.org/pipermail/rb-general/2024-September/003526.html
[2] https://lists.reproducible-builds.org/pipermail/rb-general/2024-September/003543.html


More information about the rb-general mailing list