[rb-general] rb formalism
hash at exultant.us
Tue Dec 18 11:48:32 CET 2018
Thank you for this excellent writeup! (And for finding appropriate unicode! :D)
I'd offer a few tiny amendments:
h(➡) → h(■) vs
h(➡) → h(➡■) --
*Both* are useful.
Testing whether h(➡) → h(➡■) holds is testing that a particular build instruction is reproducible.
Testing whether h(➡) → h(■) converges onto the *same* h(■) for *multiple distinct* values of h(➡) is testing whether we have diverse reproducibility.
Perhaps it's just me that thinks "reproducible" and "diversely reproducible" deserve distinct nomenclature and treatment, but... I do :) And perhaps this notation is helpful for expressing the distinction!
Storing things in a form like h(➡■) might also have some utility when it comes to immutable audit logging? Needs more exploration... but I think it could be very interesting. Most "binary transparency" logging concepts I've heard of seem to focus on logging h(■) values only, and have not yet grown to regard the provenance path...
Also, I must put on my "hash all the things" hat...
I think it's quite nice to see 'h(■)' as a git commit hash or say an IPFS content hash or other precise value rather than a potentially ambiguous human-labelled version string... 👼
(I was toying with using the '□' sigil to describe such a human-readable label which has yet to be resolved into a concrete 'h(■)' value, but that got a lot more opinionated, so, hard to write up here.)
On 12/14/18 10:12 PM, Bernhard M. Wiedemann wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> yesterday I stumbled into a session (terminologyII?) with Eric and
> some others that did not stop at defining words. They went on to write
> up some math-like formalism using characters like
> □ maybe some placeholder for data
> ■ some concrete data - e.g. gcc-4.7-4.7.2-5
> ⇨ a "build" function with variables - think f(a, b) = a+b
> ➡ a "build" function with variables filled in with concrete data
> - think f(2, 3) = 2+3 - but not yet evaluated/computed.
> and with these, it was possible to write things like
> ■ + ■ = ■ (composition/aggregation of data)
> ■ + ⇨ = ➡
> and define a reproducible build as an unambiguous mapping of hashes "h"
> We also thought, that typo-fixes in libfoo-doc and diverse compilation
> would then be modeled by different h(➡) mapping to the same build
> output h(➡■)
> But when I thought more about it, that cannot be, since ➡ is part of
> the right hand hash, it would always be unique per input.
> So it could rather be
> h(➡) → h(■)
> meaning, a reproducible build is when you take all input data and the
> build function and evaluating it, always yields the same output data.
> We also found that the model did not really cover unreproducible
> builds (with side-inputs like time and parallelism-race-results )
> and just declared them as bugs that needed to be fixed to be covered
> by the model.
> Pretty abstract and high-level, but maybe useful?
> Have phun
> Bernhard M.
> -----BEGIN PGP SIGNATURE-----
> -----END PGP SIGNATURE-----
> rb-general at lists.reproducible-builds.org mailing list
> To change your subscription options, visithttps://lists.reproducible-builds.org/listinfo/rb-general.
> To unsubscribe, send an email torb-general-unsubscribe at lists.reproducible-builds.org.
More information about the rb-general