Three bytes in a zip file
Michael Schierl
schierlm at gmx.de
Thu Apr 6 10:11:38 UTC 2023
Hello,
Am 06.04.2023 um 10:28 schrieb Larry Doolittle:
> Friends -
>
> I'm trying to make a process to generate byte-for-byte reproducible zip files.
>
> I got the contents identical, including timestamps and permissions.
> But three bytes at the 98.08% mark (bytes 5543078 to 5543081,
> out of a file size 5651451) differ between my run and a friend's run.
> Velocity-dependent? His was done on a train. ;-)
> Any zip file format experts here, who can explain where this comes from?
> And more importantly, can suggest how to fix the environment to prevent it?
Looking at the zip entry starting at 0x00549481:
>| 00549481 18 00 1c 00 66 61 62 2f 6d 61 72 62 6c 65 2d | ....fab/marble-|
>| 00549490 69 70 63 2d 64 2d 33 35 36 2e 74 78 74 55 54 09 |ipc-d-356.txtUT.|
>| 005494a0 00 03 68 ca 2c 64 XX XX XX 64 75 78 0b 00 01 04 |..h.,dXXXdux....|
>| 005494b0 e8 03 00 00 04 e8 03 00 00 d4 fd 4b 73 5d bb ae |...........Ks]..|
18 00 File name length (0x0018)
1c 00 File extra data length (0x001c)
File name: fab/marble-ipc-d-356.txt
File extra data:
>| 0054949d 55 54 09 | UT.|
>| 005494a0 00 03 68 ca 2c 64 XX XX XX 64 75 78 0b 00 01 04 |..h.,dXXXdux....|
>| 005494b0 e8 03 00 00 04 e8 03 00 00 |......... |
What follows is the compressed data.
Extra data consists of multiple fields. Each field starts with a 2-byte
ID, followed by a 2-byte length, and ends with the data.
Let's dissect the fields:
ID 0x5455 ("UT") Length 0x0009 Data 03 68 ca 2c 64 XX XX XX 64
ID 0x7875 ("ux") Length 0x000b Data 01 04 e8 03 00 00 04 e8 03 00 00
0x5455 is Info-Zip's "extended timestamp" field:
>> Extended Timestamp Extra Field:
>> ==============================
>>
>> The following is the layout of the extended-timestamp extra block.
>> (Last Revision 970118)
>>
>> Local-header version:
>>
>> Value Size Description
>> ----- ---- -----------
>> 0x5455 Short tag for this extra block type
>> TSize Short total data size for this block
>> Flags Byte info bits
>> (ModTime) Long time of last modification (UTC/GMT)
>> (AcTime) Long time of last access (UTC/GMT)
>> (CrTime) Long time of original creation (UTC/GMT)
>> If "Flags" indicates that Modtime is present in the local header
>> field, it MUST be present in the central header field, too!
>> This correspondence is required because the modification time
>> value may be used to support trans-timezone freshening and
>> updating operations with zip archives.
>>
>> The time values are in standard Unix signed-long format, indicating
>> the number of seconds since 1 January 1970 00:00:00. The times
>> are relative to Coordinated Universal Time (UTC), also sometimes
>> referred to as Greenwich Mean Time (GMT). To convert to local time,
>> the software must know the local timezone offset from UTC/GMT.
>>
>> The lower three bits of Flags in both headers indicate which time-
>> stamps are present in the LOCAL extra field:
>>
>> bit 0 if set, modification time is present
>> bit 1 if set, access time is present
>> bit 2 if set, creation time is present
>> bits 3-7 reserved for additional timestamps; not set
>>
>> Those times that are present will appear in the order indicated, but
>> any combination of times may be omitted. (Creation time may be
>> present without access time, for example.) TSize should equal
>> (1 + 4*(number of set bits in Flags)), as the block is currently
>> defined. Other timestamps may be added in the future.
As the flags are 03, mod time and access time are present, and the
different bits are within access time.
I have no experience with the various zip tools used on Unix/Linux, but
probably you can avoid including those extra fields by using the -X option.
Regards,
Michael
More information about the rb-general
mailing list