Three bytes in a zip file

Larry Doolittle larry at doolittle.boa.org
Thu Apr 6 08:28:17 UTC 2023


Friends -

I'm trying to make a process to generate byte-for-byte reproducible zip files.

I got the contents identical, including timestamps and permissions.
But three bytes at the 98.08% mark (bytes 5543078 to 5543081,
out of a file size 5651451) differ between my run and a friend's run.
Velocity-dependent?  His was done on a train.  ;-)

try.diffoscope.org is no help.
"Format-specific differences are supported for ZIP archives but no file-specific differences were detected; falling back to a binary diff."

I can get the same info as provided by diffoscope with
$ diff <(hexdump marble-ea2bb52c-mb-fab.zip) <(hexdump marble-ea2bb52c-ld-fab.zip)
346443c346443
< 05494a0 0300 ca68 642c 73cf 642e 7875 000b 0401
---
> 05494a0 0300 ca68 642c ca68 642c 7875 000b 0401

That is, 73cf642e becomes ca68642c.

The diff is so small, it seems silly to post both files, but I'll
do that anyway.
7cbdcc8b2fed002ed73017ff55e574b654fb82d061658534b4287de22339df64  marble-ea2bb52c-ld-fab.zip
573fe7e8cb662fb3e22e16c1ab4d3520f8275a0ab3dd2064df841e108a08af0e  marble-ea2bb52c-mb-fab.zip
http://recycle.lbl.gov/~ldoolitt/marble-ea2bb52c-ld-fab.zip
http://recycle.lbl.gov/~ldoolitt/marble-ea2bb52c-mb-fab.zip

Any zip file format experts here, who can explain where this comes from?
And more importantly, can suggest how to fix the environment to prevent it?

The script making this file is at
https://github.com/BerkeleyLab/Marble/blob/main/design/scripts/manufacturing.sh
but because I got the _contents_ to match already, I assert
the only important lines for the purposes of this question are

export LC_COLLATE=C
umask 0022
touch --date="@$SOURCE_DATE_EPOCH" fab/*
TZ=UTC zip --latest-time "$zipfile" fab/*

Side note, the "ea2bb52c" in the file names above refers
to the commit ID in the github repo.

  - Larry


More information about the rb-general mailing list