Three bytes in a zip file
larry at doolittle.boa.org
Thu Apr 6 21:59:17 UTC 2023
On Thu, Apr 06, 2023 at 12:11:38PM +0200, Michael Schierl wrote:
> Am 06.04.2023 um 10:28 schrieb Larry Doolittle:
> > I'm trying to make a process to generate byte-for-byte reproducible zip files.
> > I got the contents identical, including timestamps and permissions.
> > But three bytes at the 98.08% mark (bytes 5543078 to 5543081,
> > out of a file size 5651451) differ between my run and a friend's run.
> Looking at the zip entry starting at 0x00549481:
> Let's dissect the fields:
> ID 0x5455 ("UT") Length 0x0009 Data 03 68 ca 2c 64 XX XX XX 64
> ID 0x7875 ("ux") Length 0x000b Data 01 04 e8 03 00 00 04 e8 03 00 00
> 0x5455 is Info-Zip's "extended timestamp" field:
> As the flags are 03, mod time and access time are present, and the
> different bits are within access time.
Thanks! That helps a lot.
If I'm careful, I can even see the difference between the two zip files
by unpacking and
$ diff <(ls --full-time -u fab-ea2bb52c-ld) <(ls --full-time -u fab-ea2bb52c-mb)
< -rw-r--r-- 1 redacted redacted 644661 2023-04-04 18:10:00.000000000 -0700 marble-ipc-d-356.txt
> -rw-r--r-- 1 redacted redacted 644661 2023-04-06 00:25:03.000000000 -0700 marble-ipc-d-356.txt
Do you know of any tooling that can help decode zip file contents in general?
Ideally something that could be absorbed into diffoscope?
Maybe that one-liner above would be a useful addition to diffoscope.
I took a quick look for the documentation you quoted.
That's proginfo/extrafld.txt in Debian's zip source package, right?
It sure looks reverse-engineered. I guess I shouldn't expect anything
different for a package where upstream source ends in 2008. :-/
> I have no experience with the various zip tools used on Unix/Linux, but
> probably you can avoid including those extra fields by using the -X option.
Good: smaller file
Good: less to go wrong with reproducibility
Bad: the only time stamps left in the file are DOS-style implied-local-
timezone. So a zip file prepared with TZ=UTC (as needed for reproducibility)
will unpack to files with future timestamps (if unpacked shortly after being created)
for non-expert users in half the globe.
The correct unpacking instruction on *nix to avoid that becomes
TZ=UTC unzip foo.zip
Again, thanks for your prompt and constructive response!
More information about the rb-general