SOURCE_DATE_EPOCH and when to clamp timestamps
John Thorvald Wodder II
jwodder at gmail.com
Fri Jun 10 13:56:24 UTC 2022
On 2022 Jun 10, at 02:43, Chris Lamb <chris at reproducible-builds.org> wrote:
> Hi John,
>> https://reproducible-builds.org/specs/source-date-epoch/ says:
>>> Where build processes embed timestamps that are not "current", but
>>> are nevertheless still specific to one execution of the build
>>> process, they MUST use a timestamp no later than the value of this
>>> variable. This is often called "timestamp clamping".
>> However, I'm not entirely clear on what counts as "specific to one
>> execution of the build process." Specifically, I am authoring a
>> program that (among other things) takes the latest commit date from a
>> Git repository and formats it alongside the current time. If
>> SOURCE_DATE_EPOCH is set, it will replace the latter value, but I'm
>> unclear on whether the former value should be "clamped" in this case.
>> I originally read the quote as saying that "clamping" should happen,
>> but on further reflection, I'm about 90+% sure that it shouldn't, and I
>> thought I'd e-mail here just to be sure.
> The short answer is that you needn't clamp either timestamp in
> your program.
> To provide a little more background, timestamp clamping is mostly
> motivated by the desire to *avoid* discarding metadata that might be
> informative. For instance, if a package ships a very old data file
> with a timestamp from 1997 (as it hasn't been updated since then), that
> might be useful to know when trying to fix a bug: it would be a shame
> (and may be misleading) to change all files' timestamps to the value of
> SOURCE_DATE_EPOCH as, with all of the files now displaying the same
> metadata, such an old file would lose its informative distinction.
> The "no later" part of timestamp clamping is perhaps the 'clever' bit
> as it means we merely need to inspect the timestamps of generated
> files: if we find they are *older* than SOURCE_DATE_EPOCH, then they
> might be meaningful and therefore should not be touched. In any case,
> these older timestamps were likely inherited directly from the
> original source files, and so retaining them in the generated
> artefacts is unlikely to affect reproducibility.
> Conversely, if a timestamp is *newer* than SOURCE_DATE_EPOCH, then it
> was almost certainly generated as a side-effect of the build process.
> Often these are simply offset from the current time: a file generated
> a couple of minutes after the build started, for example, will have a
> timestamp that reflects that.
> We need to normalise these timestamps, of course, otherwise the build
> will not be reproducible. Yes, it is true that we are "throwing away"
> information here, but it is likely not terribly interesting that it
> took, say, 34 seconds for us to get around to generating some part of
> the build. It is these timestamps that are, as the spec rather
> gnostically puts it, "specific to one execution of the build
> Neither of the timestamps you mention are affected in this way, so
> there is therefore no need to clamp one of them to the other, and it
> might be quite misleading if you did so. Or, putting it another way,
> those values seem independent of each other, at least from a
> reproducibility perspective.
OK, I see why you wouldn't want to clamp the Git commit date, but why shouldn't the current timestamp be replaced with SOURCE_DATE_EPOCH when it's set? For clarity (Should I have mentioned this earlier?), the project in question uses the current timestamp (if so configured) as part of an autogenerated version number for packages, and it's in the same vein as another (more popular) project that also uses the current timestamp/SOURCE_DATE_EPOCH in the same way. It seems that being able to control the date that appears in a version string when nothing other than the clock has changed is an important part of reproducibility.
-- John Wodder
More information about the rb-general