Sphinx: localisation changes / reproducibility
vagrant at reproducible-builds.org
Wed Apr 26 19:40:09 UTC 2023
On 2023-04-26, James Addison wrote:
> On Wed, 26 Apr 2023 at 18:48, Vagrant Cascadian
> <vagrant at reproducible-builds.org> wrote:
>> On 2023-04-26, James Addison wrote:
>> > On Tue, 18 Apr 2023 at 18:51, Vagrant Cascadian
>> > <vagrant at reproducible-builds.org> wrote:
>> >> > James Addison <jay at jp-hosting.net> wrote:
>> >> This is why in the reproducible builds documentation on timestamps,
>> >> there is a paragraph "Timestamps are best avoided":
>> >> https://reproducible-builds.org/docs/timestamps/
>> >> Or as I like to say "There are no timestamps quite like NO timestamps!"
>> > I see a parallel between the use of timestamps as a key for
>> > data-lookup (as in Holger's developers-reference package), and the use
>> > of locale as a similar data-lookup key (as in the case of localised
>> > documentation builds).
>> > I'm not sure what the equivalent approach is for localisation, though.
>> > Command-line software, for example, requires at least one written
>> > natural-language to be usable, and as a second use case, providing
>> > natural-language documentation with software is highly recommended (is
>> > it part of the software? maybe not. but a sufficiently-confusing
>> > poorly-translated error message could be as serious as a code-related
>> > bug, I think?).
>> > Linking back to my recent experience with Sphinx, and from the
>> > perspective of allowing-users-to-verify-their-software, I'd tend to
>> > think that an ideally-produced, reproducible, localised software would
>> > include _all_ available translations in the build artifact. Some of
>> > that could be retrieved at runtime (gettext, for example), and some
>> > could be static (file-backed HTML documentation, where runtime lookups
>> > might not be so straightforward).
>> I struggle to see the parallel. A timestamp is an arbitrary value based
>> on when you built it, whereas the locale-rendered document should be
>> reproducibly translated based on the translations you have available at
>> the time you run whatever process generates the translated version of
>> the document/binary, and regardless of the locale of the build
> Ok, I think I understand. Please check my understanding, though: I
> interpret your perspective as matching the ideal-world scenario that
> John outlined, where the SOURCE_DATE_EPOCH value has no effect at all
> on the output of the build
Yes, ideally SOURCE_DATE_EPOCH does not matter. It is a workaround to
embed a (hopefully meaningful) timestamp, when from a reproducible
builds perspective, ideally there would be no timestamp at all in the
resulting artifacts. SOURCE_DATE_EPOCH is a tolerable compromise when
leaving out timestamps entirely is either too difficult to achieve
(technically, politically, emotionally, logistically ...).
> Until then, I see both the build-time (SOURCE_DATE_EPOCH) and
> build-locale as inputs that do affect the output of software build
> systems, and believe that relevant guidance could help projects
> migrate towards reproducibility.
I would say a build should be reproducible regardless of the build
If you want to generate, say, README.fr.txt, the build process
translating that from README.txt should force the locale to use to
generate that document (e.g. LC_ALL=fr_FR.UTF-8), ignoring the locale of
the host system (e.g. C.UTF-8) and the locale of the user logged into
that system (e.g. es_ES.UTF-8); in this case, the locale of the build
environment should be made irrelevent by whatever build process is
used. Maybe the build logs respect the user or system locale in some
ways, but the resulting build artifact (e.g. README.fr.txt) should be
immune to the system and user locale settings.
>> While there almost certainly might be more than one legitimate
>> translation for a given work, your process for rendering it should
>> really only have one particular output given a particular input
>> (e.g. the source language input and the descriptions of how to translate
>> it to the desired language)... barring, of course, bugs in the system
>> ... or am i missing something entirely?
> No, I don't think you missed anything, and I think we have the same
> understanding of the components. We're likely arriving from different
> perspectives on the problem space.
> My question is approximately this: for some source software developed
> in a natural language that I don't read or understand, and that
> includes statically-built documentation (say, HTML files for example),
> could I determine that the distributed software (an installer file
> downloaded from the web, for example) recommended to me because it
> includes support for a natural language that I _do_ understand is
> identical to the one in the developers' own natural language?
You have confused me here...
Two different languages are impossible to be bit-for-bit identical... at
least, in the vast majority of cases for any significantly large
content; sometimes individual words or even short phrases may be
identical between two similar languages.
So no, I do not thing it correct or possible to say it is identical;
reproducible builds does not help with confirming the accuracy of the
meaning of the translation.
> (and I think that yes, it's possible: build the source to include the
> content from all available languages, and distribute that single copy;
> the translations may be better or worse in some areas, but we can all
> agree that it is not only the same source, but the same build of that
Yes, given the same input files, translation files, etc. it should
produce a bit-for-bit identical reproducible result; that is what
reproducible builds can promise! Making a strong connection between a
built artifact and the source from which it was built.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 227 bytes
Desc: not available
More information about the rb-general