Sphinx: localisation changes / reproducibility
James Addison
jay at jp-hosting.net
Wed Apr 26 19:05:26 UTC 2023
On Wed, 26 Apr 2023 at 18:48, Vagrant Cascadian
<vagrant at reproducible-builds.org> wrote:
>
> On 2023-04-26, James Addison wrote:
> > On Tue, 18 Apr 2023 at 18:51, Vagrant Cascadian
> > <vagrant at reproducible-builds.org> wrote:
> >> > James Addison <jay at jp-hosting.net> wrote:
> >> This is why in the reproducible builds documentation on timestamps,
> >> there is a paragraph "Timestamps are best avoided":
> >>
> >> https://reproducible-builds.org/docs/timestamps/
> >>
> >> Or as I like to say "There are no timestamps quite like NO timestamps!"
> >
> > I see a parallel between the use of timestamps as a key for
> > data-lookup (as in Holger's developers-reference package), and the use
> > of locale as a similar data-lookup key (as in the case of localised
> > documentation builds).
>
> > I'm not sure what the equivalent approach is for localisation, though.
> > Command-line software, for example, requires at least one written
> > natural-language to be usable, and as a second use case, providing
> > natural-language documentation with software is highly recommended (is
> > it part of the software? maybe not. but a sufficiently-confusing
> > poorly-translated error message could be as serious as a code-related
> > bug, I think?).
> >
> > Linking back to my recent experience with Sphinx, and from the
> > perspective of allowing-users-to-verify-their-software, I'd tend to
> > think that an ideally-produced, reproducible, localised software would
> > include _all_ available translations in the build artifact. Some of
> > that could be retrieved at runtime (gettext, for example), and some
> > could be static (file-backed HTML documentation, where runtime lookups
> > might not be so straightforward).
>
> I struggle to see the parallel. A timestamp is an arbitrary value based
> on when you built it, whereas the locale-rendered document should be
> reproducibly translated based on the translations you have available at
> the time you run whatever process generates the translated version of
> the document/binary, and regardless of the locale of the build
> environment.
Ok, I think I understand. Please check my understanding, though: I
interpret your perspective as matching the ideal-world scenario that
John outlined, where the SOURCE_DATE_EPOCH value has no effect at all
on the output of the build
Until then, I see both the build-time (SOURCE_DATE_EPOCH) and
build-locale as inputs that do affect the output of software build
systems, and believe that relevant guidance could help projects
migrate towards reproducibility.
> With runtime translation, you would be desiring translation from the
> source language to the operating locale of the environment you've called
> it in... but that should still be systematic, no?
Runtime translation should be systematic, yes. So recommending that
projects use runtime translation (instead of compiling-in separate
source files for each language) is good advice.
> While there almost certainly might be more than one legitimate
> translation for a given work, your process for rendering it should
> really only have one particular output given a particular input
> (e.g. the source language input and the descriptions of how to translate
> it to the desired language)... barring, of course, bugs in the system
> ... or am i missing something entirely?
No, I don't think you missed anything, and I think we have the same
understanding of the components. We're likely arriving from different
perspectives on the problem space.
My question is approximately this: for some source software developed
in a natural language that I don't read or understand, and that
includes statically-built documentation (say, HTML files for example),
could I determine that the distributed software (an installer file
downloaded from the web, for example) recommended to me because it
includes support for a natural language that I _do_ understand is
identical to the one in the developers' own natural language?
(and I think that yes, it's possible: build the source to include the
content from all available languages, and distribute that single copy;
the translations may be better or worse in some areas, but we can all
agree that it is not only the same source, but the same build of that
source)
> Unless, I guess, you're using some Machine Learning model to produce
> your translations?
... well, in honesty I think that Machine Learning could -- and in
many cases, perhaps should -- be encouraged towards
deterministic/repeatable behaviour. But that's probably a
conversation for another thread.
More information about the rb-general
mailing list