Sphinx: localisation changes / reproducibility

Wed Apr 26 14:22:26 UTC 2023

On Tue, 18 Apr 2023 at 18:51, Vagrant Cascadian
<vagrant at reproducible-builds.org> wrote:
>
> > James Addison <jay at jp-hosting.net> wrote:
> >> Inverting the question somewhat: if a single source-base is rebuilt
> >> using two different SOURCE_DATE_EPOCH values (let's say, 1970-01-01
> >> and 2023-04-18), then what are expected/valid differences in the
> >> resulting output?
> >
> > In the ideal circumstances, the resulting output would be identical,
> > because the build process would have no dependencies on
> > SOURCE_DATE_EPOCH.
> ...
> > ...
> > So, if you rebuild a non-ideal package with two different values of
> > SOURCE_DATE_EPOCH, you will get two different binaries that differ in
> > the areas of dependency.  For example, if the documentation embeds a
> > build-date in its page footer, you'd expect every page of the built
> > documentation would differ.  If the "--version" output of the program
> > embeds the build date, then the code that produces that output would
> > differ.  Etc.  In fact, "fuzzing" their code with different values
> > of SOURCE_DATE_EPOCH can help a maintainer identify where those
> > dependencies still remain.
>
> Nice explanations!
>
> This is why in the reproducible builds documentation on timestamps,
> there is a paragraph "Timestamps are best avoided":
>
>   https://reproducible-builds.org/docs/timestamps/
>
> Or as I like to say "There are no timestamps quite like NO timestamps!"

I see a parallel between the use of timestamps as a key for
data-lookup (as in Holger's developers-reference package), and the use
of locale as a similar data-lookup key (as in the case of localised
documentation builds).

I agree with no-timestamps as a good objective, and imagine that it's
probably already achieved by many software packages, and likely to be
achievable for many more (especially if we can encourage
programming-languages and software tooling in that direction, which
seems sensible).

I'm not sure what the equivalent approach is for localisation, though.
Command-line software, for example, requires at least one written
natural-language to be usable, and as a second use case, providing
natural-language documentation with software is highly recommended (is
it part of the software?  maybe not.  but a sufficiently-confusing
poorly-translated error message could be as serious as a code-related
bug, I think?).

Linking back to my recent experience with Sphinx, and from the
perspective of allowing-users-to-verify-their-software, I'd tend to
think that an ideally-produced, reproducible, localised software would
include _all_ available translations in the build artifact.  Some of
that could be retrieved at runtime (gettext, for example), and some
could be static (file-backed HTML documentation, where runtime lookups
might not be so straightforward).

That contrasts with the timestamp approach, though.  Timestamp
minimization, localisation maximization?

(I'm rambling a bit, I admit: the direction I'm trying to navigate in
is to find a recommended and easy-to-communicate strategy for making
internationalizable software reproducible)