Sphinx: localisation changes / reproducibility

Tue Apr 18 12:47:04 UTC 2023

John Gilmore <gnu at toad.com> wrote:
> James Addison <jay at jp-hosting.net> wrote:
> > John Gilmore <gnu at toad.com> wrote:
> >
> > > I think that SOURCE_DATE_EPOCH generally refers to the check-IN time of
> > > each of the source package(s) being rebuilt.  You can retrieve the
> > > packages anytime later than that, and you can do the build at any time
> > > later, and SOURCE_DATE_EPOCH should not change (and the built binaries
> > > and docs should also not change).
> >
> > When the goal is to build the software as it was available to the
> > author at the time of code commit/check-in - and I think that that is
> > a valid use case - then that makes sense.
>
> I think of the goal as being less related to the author, and more
> related to the creator of a widespread binary release (such as a Linux
> distribution, or an app that goes into an app-store).

>From both developer and user perspectives, I'd certainly like to know
that a source codebase corresponds to the delivered application.  I'm
not sure whether the size of the audience is necessarily relevant,
though.

> The goal is then that the recipient of that binary release can verify
> that the source code they obtained from the same place is able to
> rebuild that exact widespread binary release.  This proves that the
> source code can be trusted for some purposes, such as being used to read
> it to understand what the binary does.  Or to make small bug-fixes to it.

As above; agreed that allowing recipients to verify and inspect the
software provided to them is an important goal.

Although we get into licensing territory, I think it'd be better in
almost all cases if any fixes (and it's hard to philosophize about the
scale of a fix: one instruction or code line edit can translate into
anything from a small adjustment to an order-of-magnitude efficiency
change) are made available.  That's based on placing trust in
maintainers and developers to have their users' interests at heart
(and, incentive-wise, that users can go elsewhere if software doesn't
seem to respect them), and that it becomes easier -- as long as the
volume of bugs and patches is manageable -- to improve software when
provided with user and developer feedback.

> Or to become the base for further evolution of the project if the
> maintainer is suddenly "hit by a bus" and stops making further releases.

Is there a risk if only the code and not a corresponding reproducible
binary build is available?

> James Addison <jay at jp-hosting.net> wrote:
> > Inverting the question somewhat: if a single source-base is rebuilt
> > using two different SOURCE_DATE_EPOCH values (let's say, 1970-01-01
> > and 2023-04-18), then what are expected/valid differences in the
> > resulting output?
>
> In the ideal circumstances, the resulting output would be identical,
> because the build process would have no dependencies on
> SOURCE_DATE_EPOCH.  In these ideal circumstances, the code is "portable",
> in the same sense that people understand "portable" code will build and
> run the same on an ARM running MacOS as it does on an x86 running
> Windows.  There are many ways to make code portable, but the most robust
> of them is to *eliminate* dependencies.
>
> A more fragile way would be to #ifdef your code to adjust for every
> supported build or run environment.  That fragile way breaks as soon as
> it needs to build or run in a new environment, whereas the robust way
> has already made it likely to "just work" in a new environment that it
> has never encountered before (or to have only one or two minor things
> that need adjusting).  Note that if it built fine in a Linux system
> version X, then a later Linux system version Y is a "new environment"
> and might break the code.  The robust version is again less likely to
> break, because it inherently, by design, cares less about the nitty
> gritty details of its environment.
>
> Much code in Linux does not reach that ideal (yet!).  Instead, builds of
> non-ideal code use SOURCE_DATE_EPOCH as a crutch to limit their
> dependencies on the local build environment, replacing those
> dependencies with a dependency on SOURCE_DATE_EPOCH.
>
> So, if you rebuild a non-ideal package with two different values of
> SOURCE_DATE_EPOCH, you will get two different binaries that differ in
> the areas of dependency.  For example, if the documentation embeds a
> build-date in its page footer, you'd expect every page of the built
> documentation would differ.  If the "--version" output of the program
> embeds the build date, then the code that produces that output would
> differ.  Etc.  In fact, "fuzzing" their code with different values
> of SOURCE_DATE_EPOCH can help a maintainer identify where those
> dependencies still remain.
>
> We try to talk package authors out of such dependencies, but ultimately
> it's their package and they make the architectural decisions.  To some
> of them it's incredibly important that the build date appears in the
> man-page.  Reproducibility usually features lower among their priorities
> than it does in ours.

Thanks - it's nice to imagine that there is a possible future where
architectural and platform differences are both abstracted, yet also
each abstraction has become bug-free (the latter is important because
otherwise some users would remain at a disadvantage).

In the presence of source dependencies that may update over time, my
(Debian-biased) sense remains that it's more practical to provide a
timestamp-based source archive (offering time 's' selection) in
combination with a single build timestamp (time 't') if we keep both
the software-delivery and also bug-reproducibility use cases in mind.