Three bytes in a zip file

Michael Schierl schierlm at gmx.de
Thu Apr 6 10:11:38 UTC 2023


Hello,


Am 06.04.2023 um 10:28 schrieb Larry Doolittle:
> Friends -
>
> I'm trying to make a process to generate byte-for-byte reproducible zip files.
>
> I got the contents identical, including timestamps and permissions.
> But three bytes at the 98.08% mark (bytes 5543078 to 5543081,
> out of a file size 5651451) differ between my run and a friend's run.
> Velocity-dependent?  His was done on a train.  ;-)

> Any zip file format experts here, who can explain where this comes from?
> And more importantly, can suggest how to fix the environment to prevent it?


Looking at the zip entry starting at 0x00549481:

>| 00549481     18 00 1c 00 66 61 62  2f 6d 61 72 62 6c 65 2d  | ....fab/marble-|
>| 00549490  69 70 63 2d 64 2d 33 35  36 2e 74 78 74 55 54 09  |ipc-d-356.txtUT.|
>| 005494a0  00 03 68 ca 2c 64 XX XX  XX 64 75 78 0b 00 01 04  |..h.,dXXXdux....|
>| 005494b0  e8 03 00 00 04 e8 03 00  00 d4 fd 4b 73 5d bb ae  |...........Ks]..|

18 00    File name length (0x0018)
1c 00    File extra data length (0x001c)

File name: fab/marble-ipc-d-356.txt

File extra data:

>| 0054949d                                          55 54 09  |             UT.|
>| 005494a0  00 03 68 ca 2c 64 XX XX  XX 64 75 78 0b 00 01 04  |..h.,dXXXdux....|
>| 005494b0  e8 03 00 00 04 e8 03 00  00                       |.........       |

What follows is the compressed data.

Extra data consists of multiple fields. Each field starts with a 2-byte
ID, followed by a 2-byte length, and ends with the data.

Let's dissect the fields:

ID 0x5455 ("UT") Length 0x0009 Data 03 68 ca 2c 64 XX XX XX 64
ID 0x7875 ("ux") Length 0x000b Data 01 04 e8 03 00 00 04 e8 03 00 00

0x5455 is Info-Zip's "extended timestamp" field:

>> Extended Timestamp Extra Field:
>> ==============================
>>
>> The following is the layout of the extended-timestamp extra block.
>> (Last Revision 970118)
>>
>> Local-header version:
>>
>> Value		Size		Description
>> -----		----		-----------
>> 0x5455	Short		tag for this extra block type
>> TSize	Short		total data size for this block
>> Flags	Byte		info bits
>> (ModTime)	Long		time of last modification (UTC/GMT)
>> (AcTime)	Long		time of last access (UTC/GMT)
>> (CrTime)	Long		time of original creation (UTC/GMT)

>>     If "Flags" indicates that Modtime is present in the local header
>>     field, it MUST be present in the central header field, too!
>>     This correspondence is required because the modification time
>>     value may be used to support trans-timezone freshening and
>>     updating operations with zip archives.
>>
>> The time values are in standard Unix signed-long format, indicating
>> the number of seconds since 1 January 1970 00:00:00.  The times
>> are relative to Coordinated Universal Time (UTC), also sometimes
>> referred to as Greenwich Mean Time (GMT).  To convert to local time,
>> the software must know the local timezone offset from UTC/GMT.
>>
>> The lower three bits of Flags in both headers indicate which time-
>> stamps are present in the LOCAL extra field:
>>
>> bit 0		if set, modification time is present
>> bit 1		if set, access time is present
>> bit 2		if set, creation time is present
>> bits 3-7		reserved for additional timestamps; not set
>>
>> Those times that are present will appear in the order indicated, but
>> any combination of times may be omitted.  (Creation time may be
>> present without access time, for example.)  TSize should equal
>> (1 + 4*(number of set bits in Flags)), as the block is currently
>> defined.  Other timestamps may be added in the future.

As the flags are 03, mod time and access time are present, and the
different bits are within access time.


I have no experience with the various zip tools used on Unix/Linux, but
probably you can avoid including those extra fields by using the -X option.


Regards,


Michael



More information about the rb-general mailing list