Sphinx: localisation changes / reproducibility

Wed Apr 26 17:47:54 UTC 2023

On 2023-04-26, James Addison wrote:
> On Tue, 18 Apr 2023 at 18:51, Vagrant Cascadian
> <vagrant at reproducible-builds.org> wrote:
>> > James Addison <jay at jp-hosting.net> wrote:
>> This is why in the reproducible builds documentation on timestamps,
>> there is a paragraph "Timestamps are best avoided":
>>
>>   https://reproducible-builds.org/docs/timestamps/
>>
>> Or as I like to say "There are no timestamps quite like NO timestamps!"
>
> I see a parallel between the use of timestamps as a key for
> data-lookup (as in Holger's developers-reference package), and the use
> of locale as a similar data-lookup key (as in the case of localised
> documentation builds).

> I'm not sure what the equivalent approach is for localisation, though.
> Command-line software, for example, requires at least one written
> natural-language to be usable, and as a second use case, providing
> natural-language documentation with software is highly recommended (is
> it part of the software?  maybe not.  but a sufficiently-confusing
> poorly-translated error message could be as serious as a code-related
> bug, I think?).
>
> Linking back to my recent experience with Sphinx, and from the
> perspective of allowing-users-to-verify-their-software, I'd tend to
> think that an ideally-produced, reproducible, localised software would
> include _all_ available translations in the build artifact.  Some of
> that could be retrieved at runtime (gettext, for example), and some
> could be static (file-backed HTML documentation, where runtime lookups
> might not be so straightforward).

I struggle to see the parallel. A timestamp is an arbitrary value based
on when you built it, whereas the locale-rendered document should be
reproducibly translated based on the translations you have available at
the time you run whatever process generates the translated version of
the document/binary, and regardless of the locale of the build
environment.

With runtime translation, you would be desiring translation from the
source language to the operating locale of the environment you've called
it in... but that should still be systematic, no?

In a traditional translation process, as I understand it, you have the
source language, some system of translating that document or bit of text
into another language (maybe by words, strings, partial or whole
documents, etc.).

While there almost certainly might be more than one legitimate
translation for a given work, your process for rendering it should
really only have one particular output given a particular input
(e.g. the source language input and the descriptions of how to translate
it to the desired language)... barring, of course, bugs in the system
... or am i missing something entirely?

Unless, I guess, you're using some Machine Learning model to produce
your translations?

live well,
  vagrant
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 227 bytes
Desc: not available
URL: <http://lists.reproducible-builds.org/pipermail/rb-general/attachments/20230426/0f934265/attachment.sig>