SOURCE_DATE_EPOCH and when to clamp timestamps

Chris Lamb chris at reproducible-builds.org
Fri Jun 10 06:43:40 UTC 2022


Hi John,

> https://reproducible-builds.org/specs/source-date-epoch/ says:
>
>> Where build processes embed timestamps that are not "current", but
>> are nevertheless still specific to one execution of the build
>> process, they MUST use a timestamp no later than the value of this
>> variable. This is often called "timestamp clamping".
>
> However, I'm not entirely clear on what counts as "specific to one 
> execution of the build process."  Specifically, I am authoring a 
> program that (among other things) takes the latest commit date from a 
> Git repository and formats it alongside the current time.  If 
> SOURCE_DATE_EPOCH is set, it will replace the latter value, but I'm 
> unclear on whether the former value should be "clamped" in this case.  
> I originally read the quote as saying that "clamping" should happen, 
> but on further reflection, I'm about 90+% sure that it shouldn't, and I 
> thought I'd e-mail here just to be sure.

The short answer is that you needn't clamp either timestamp in
your program.

To provide a little more background, timestamp clamping is mostly
motivated by the desire to *avoid* discarding metadata that might be
informative. For instance, if a package ships a very old data file
with a timestamp from 1997 (as it hasn't been updated since then), that
might be useful to know when trying to fix a bug: it would be a shame
(and may be misleading) to change all files' timestamps to the value of
SOURCE_DATE_EPOCH as, with all of the files now displaying the same
metadata, such an old file would lose its informative distinction.

The "no later" part of timestamp clamping is perhaps the 'clever' bit
as it means we merely need to inspect the timestamps of generated
files: if we find they are *older* than SOURCE_DATE_EPOCH, then they
might be meaningful and therefore should not be touched. In any case,
these older timestamps were likely inherited directly from the
original source files, and so retaining them in the generated
artefacts is unlikely to affect reproducibility.

Conversely, if a timestamp is *newer* than SOURCE_DATE_EPOCH, then it
was almost certainly generated as a side-effect of the build process.
Often these are simply offset from the current time: a file generated
a couple of minutes after the build started, for example, will have a
timestamp that reflects that.

We need to normalise these timestamps, of course, otherwise the build
will not be reproducible. Yes, it is true that we are "throwing away"
information here, but it is likely not terribly interesting that it
took, say, 34 seconds for us to get around to generating some part of
the build. It is these timestamps that are, as the spec rather
gnostically puts it, "specific to one execution of the build
process."

Neither of the timestamps you mention are affected in this way, so
there is therefore no need to clamp one of them to the other, and it
might be quite misleading if you did so. Or, putting it another way,
those values seem independent of each other, at least from a
reproducibility perspective.


Best wishes,

-- 
      o
    ⬋   ⬊      Chris Lamb
   o     o     reproducible-builds.org 💠
    ⬊   ⬋
      o



More information about the rb-general mailing list