[rb-general] GNU coding standards discussion

Ximin Luo infinity0 at debian.org
Fri Dec 2 16:52:00 CET 2016


Ian Jackson:
> [..]
> 
> Inevitably this means that reproducibility of a source build in a
> non-UTF locale might depend on the locale.
> 
> In practice I think all widely-distributed binaries should (and will)
> be built with a UTF-8 LC_CTYPE, so this is not a problem for third
> party verification of distro binaries.
> 
> Whether to handle this case, in the GNU coding standards, by a
> definition of "input" or by restricting the set of aspects of the
> output which are supposed to be identical, is just a difference in
> wording, not a difference in meaning.  The GNU coding standards
> necessarily have a more complicated definition of `output' than a
> distro build.

OK, I understand now. I guess this is a fundamental difference between source-installation and binary-installation. I'd suggest definitions something along these lines:

1. Development inputs: source code, dependencies, and tools.
2. User-configuration inputs: locale, "./configure" arguments, user database, filesystem timestamps

For source-installations like `make install DESTDIR=xxx`, the inputs would be (1, 2), and the output should be reproducible given the same set of (1, 2).

For binary-installations like Debian packages, we effectively fix (2), so the inputs would be (1) and the output should be reproducible given the same set of (1), and implicitly (2).

However, we/you still have to be careful in defining (2), as well as what the reproducible part of the output is. If you defined it in too detailed of a way, then the result would be "reproducible in theory" but probably not in practise.

Example: is if my user plugdev has uid 123 and your plugdev has uid 234. Then the previous definition that you gave, says nothing about how the output of these two situations would be related. Both situations are reproducible in themselves, but they are different. Further: if the installation doesn't contain a file with user "plugdev", then our installations should be identical and reproducible - but your definition doesn't require that.

This means that less people in the world would be reproducing each other's things, we would be splitting ourselves into smaller reproducible "islands". But a large part of the security aspects of reproducibility is to have independent builders *actually reproduce* the same things as each other.

That is why it may be simpler, to instead comment about (e.g.) the bit-to-bit-reproducibility of the tarballs of the installation trees - assuming you store symbolic user/group names and not numeric ids. A non-tar solution would also be OK.

Another example is the source timestamps thing. Different people extracting tarballs will get the same reproducible timestamps, but that's not true for a git checkout (I think). But ideally we'd like them to reproduce the same binary outputs, and not different-but-reproducible outputs.

(Of course, the actual *behaviour* of the 123/234 situations would be different. This is a whole other topic but I'm not sure if we have time to talk about that here. For binary distros, we insist on bit-to-bit reproducibility of the package file, because that's what is distributed to other people, and is what "security" applies to. It's not clear what the counterpart is for the source-installation case.)

As for the documentation locale issue, I would suggest that if the user wants documentation built in a non-UTF-8 locale then they should pass a flag to ./configure to set this up. But perhaps this would cause too much effort. Anyway, this does not matter too much - as long as you do enumerate exactly what you consider to be (2) "user-configuration" inputs, so that binary distributions like Debian know what to fix as constants.

> [..] it obviously can't contradict the definiton of
> `input' because it does not try to define `input' or anything that
> `input' depends on.
> 

OK, I just meant that for clarify purposes, instead of writing in a standard:

Input = X
Output = always Y(X), reproducibly
(.. Output might change based on Z1)
(.. Output might change based on Z2)

it would be better to write:

Input = X, Z
Output = always Y(X, Z), reproducibly

That's what I tried to do with (1, 2) above.

> My goal was to legitimise the common pattern of using install(8) to
> install source files directly into $DESTDIR.
> 
> [..]

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git


More information about the rb-general mailing list