[rb-general] Source code timestamps

Profpatsch mail at profpatsch.de
Mon Dec 5 22:54:39 CET 2016


On 16-12-05 08:09pm, Ximin Luo wrote:
> Profpatsch:
> > FYC: nix sets the timestamps of all files in the store to 1-1-1970, 1:00.
> > Also, it has a special archiving format, .nar, which is similar to
> > tar, minus nondeterminism (in ordering, I think).
> > 
> > You can find more in-depth information in Eelco Dolstra’s PhD thesis
> > (https://nixos.org/~eelco/pubs/phd-thesis.pdf), section 5.2.
> > 
> 
> The thing is 281 pages long. On page 112 it says they set all timestamps to 0
> but I can't find an explanation of the rationale. "X does Y" is not an
> argument in favour of anything, and it doesn't convince people who want to
> take the time to think about a problem. Do you know what the rationale is?

The rationale is in section 5.2(.1), as I wrote, end of page 91f.

A build process is basically a pure function f from Set of Things
to Bytes.
The „Set of Things“ consists of stuff like locale, current time,
configuration flags, environment and a file tree of sources.
Now, since it’s a function the same input will always produce the
same output and outputs for different inputs may be the same or different.
The function may also decide whether to use inputs.

Reproducible builds are the art of minimizing the amount of factors
that influence the output of build functions.
This can be achieved by changing the function to not use inputs. That
is a lot of work, since there are a lot of functions (packages).
A more efficient way is to reduce the amount of items in the Set of
Things or to reduce the amount of information these items carry.

For locale, if build environments are set to always use a certain
locale (e.g. LC_ALL=C), that input vanishes.

For source code timestamps, if there is a data type SourceTree
which consists of a File, if we reduce the complexity of File by
removing time stamps, we also reduce the domain of the build function.
That means if there were two FilePaths a and b with different
timestamps but the same file contents and f(a) != f(b) before,
afterwards it will be f(a) == f(b), because the build cannot use
the timestamps any more to change the output.

It’s a question of interface design, really. Should the interface
of builds contain the time stamps of files?
The smaller the input domain, the smaller the amount of possible
variation in outputs.

-- 
Proudly written in Mutt with Vim on NixOS.
May take up to five days to read your message. If it’s urgent, call me.


More information about the rb-general mailing list