Sphinx: localisation changes / reproducibility

Tue Apr 18 03:54:39 UTC 2023

James Addison <jay at jp-hosting.net> wrote:
> When the goal is to build the software as it was available to the
> author at the time of code commit/check-in - and I think that that is
> a valid use case - then that makes sense.

I think of the goal as being less related to the author, and more
related to the creator of a widespread binary release (such as a Linux
distribution, or an app that goes into an app-store).

The goal is then that the recipient of that binary release can verify
that the source code they obtained from the same place is able to
rebuild that exact widespread binary release.  This proves that the
source code can be trusted for some purposes, such as being used to read
it to understand what the binary does.  Or to make small bug-fixes to it.
Or to become the base for further evolution of the project if the
maintainer is suddenly "hit by a bus" and stops making further releases.

James Addison <jay at jp-hosting.net> wrote:
> Inverting the question somewhat: if a single source-base is rebuilt
> using two different SOURCE_DATE_EPOCH values (let's say, 1970-01-01
> and 2023-04-18), then what are expected/valid differences in the
> resulting output?

In the ideal circumstances, the resulting output would be identical,
because the build process would have no dependencies on
SOURCE_DATE_EPOCH.  In these ideal circumstances, the code is "portable",
in the same sense that people understand "portable" code will build and
run the same on an ARM running MacOS as it does on an x86 running
Windows.  There are many ways to make code portable, but the most robust
of them is to *eliminate* dependencies.

A more fragile way would be to #ifdef your code to adjust for every
supported build or run environment.  That fragile way breaks as soon as
it needs to build or run in a new environment, whereas the robust way
has already made it likely to "just work" in a new environment that it
has never encountered before (or to have only one or two minor things
that need adjusting).  Note that if it built fine in a Linux system
version X, then a later Linux system version Y is a "new environment"
and might break the code.  The robust version is again less likely to
break, because it inherently, by design, cares less about the nitty
gritty details of its environment.

Much code in Linux does not reach that ideal (yet!).  Instead, builds of
non-ideal code use SOURCE_DATE_EPOCH as a crutch to limit their
dependencies on the local build environment, replacing those
dependencies with a dependency on SOURCE_DATE_EPOCH.

So, if you rebuild a non-ideal package with two different values of
SOURCE_DATE_EPOCH, you will get two different binaries that differ in
the areas of dependency.  For example, if the documentation embeds a
build-date in its page footer, you'd expect every page of the built
documentation would differ.  If the "--version" output of the program
embeds the build date, then the code that produces that output would
differ.  Etc.  In fact, "fuzzing" their code with different values
of SOURCE_DATE_EPOCH can help a maintainer identify where those
dependencies still remain.

We try to talk package authors out of such dependencies, but ultimately
it's their package and they make the architectural decisions.  To some
of them it's incredibly important that the build date appears in the
man-page.  Reproducibility usually features lower among their priorities
than it does in ours.

	John