[rb-general] rb formalism

Eric Myhre hash at exultant.us
Tue Dec 18 11:48:32 CET 2018

Thank you for this excellent writeup! (And for finding appropriate unicode! :D)

I'd offer a few tiny amendments:

h(➡) → h(■) vs
h(➡) → h(➡■) --

*Both* are useful.

Testing whether h(➡) → h(➡■) holds is testing that a particular build instruction is reproducible.

Testing whether h(➡) → h(■) converges onto the *same* h(■) for *multiple distinct* values of h(➡) is testing whether we have diverse reproducibility.

Perhaps it's just me that thinks "reproducible" and "diversely reproducible" deserve distinct nomenclature and treatment, but... I do :)  And perhaps this notation is helpful for expressing the distinction!

Storing things in a form like h(➡■) might also have some utility when it comes to immutable audit logging?  Needs more exploration... but I think it could be very interesting.  Most "binary transparency" logging concepts I've heard of seem to focus on logging h(■) values only, and have not yet grown to regard the provenance path...


Also, I must put on my "hash all the things" hat...

I think it's quite nice to see 'h(■)' as a git commit hash or say an IPFS content hash or other precise value rather than a potentially ambiguous human-labelled version string... 👼

(I was toying with using the '□' sigil to describe such a human-readable label which has yet to be resolved into a concrete 'h(■)' value, but that got a lot more opinionated, so, hard to write up here.)


On 12/14/18 10:12 PM, Bernhard M. Wiedemann wrote:
> Hash: SHA1
> Hi,
> yesterday I stumbled into a session (terminologyII?) with Eric and
> some others that did not stop at defining words. They went on to write
> up some math-like formalism using characters like
> □ maybe some placeholder for data
> ■ some concrete data - e.g. gcc-4.7-4.7.2-5
> ⇨ a "build" function with variables - think f(a, b) = a+b
> ➡ a "build" function with variables filled in with concrete data
>     - think f(2, 3) = 2+3  - but not yet evaluated/computed.
> and with these, it was possible to write things like
> ■ + ■ = ■ (composition/aggregation of data)
> ■ + ⇨ = ➡
> and define a reproducible build as an unambiguous mapping of hashes "h"
> +
> We also thought, that typo-fixes in libfoo-doc and diverse compilation
> would then be modeled by different h(➡) mapping to the same build
> output h(➡■)
> But when I thought more about it, that cannot be, since ➡ is part of
> the right hand hash, it would always be unique per input.
> So it could rather be
> h(➡) → h(■)
> meaning, a reproducible build is when you take all input data and the
> build function and evaluating it, always yields the same output data.
> We also found that the model did not really cover unreproducible
> builds (with side-inputs like time and parallelism-race-results [1])
> and just declared them as bugs that needed to be fixed to be covered
> by the model.
> Pretty abstract and high-level, but maybe useful?
> Have phun
> Bernhard M.
> [1]
> https://github.com/bmwiedemann/theunreproduciblepackage/blob/master/race/race.sh
> ZPmsAJ0WYLDnxkz1HMg2XPyEj94HXbwttQCg9l2BvD4LFwiETW2WaxaBw6LLHnc=
> =jsao
> _______________________________________________
> rb-general at lists.reproducible-builds.org  mailing list
> To change your subscription options, visithttps://lists.reproducible-builds.org/listinfo/rb-general.
> To unsubscribe, send an email torb-general-unsubscribe at lists.reproducible-builds.org.

More information about the rb-general mailing list