How could we accelerate *deployment* of verified reproducible builds?

Bernhard M. Wiedemann bernhardout at lsmod.de
Sat Jan 30 20:59:59 UTC 2021



On 30/01/2021 17.27, David A. Wheeler wrote:
> Technically correct, the best kind of correct :-). And to be fair, there *are* some reproducible builds (as others have noted).

on that topic, openSUSE is somewhere around 96% verifiable (modulo some
missing mtime normalization) and I am also constantly verifying with my
rebuilder. Visible as "verified" : 1
in https://rb.zq1.de/compare.factory/reproducible.json


> But I want to see them accelerated into more key places. An unfair counter statement could be “you’ve been at this a while, why aren’t you done?”. I think that’s unfair because it’s not so easy; there are many little things that have to be done (timestamps set, collections forced into specific orders, etc.). But what would it take to accelerate things?

There are some hard problems, when reproducibility collides with other
desired properties of software.

E.g.
1) performance: gcc PGO makes gcc run 8% faster

2) security: tigervnc signs .jar files with random temp privkey,
   libcamera also does sigs to not trust third-party modules,
openbuildservice signs kernel modules with a secret key for secure-boot
other packages generate random DH-params and re-using them can make
attacks easier with pre-computing


3) simplicity/maintainability/portability/reliability (e.g. when
software needs to work with non-GNU date, patches can get rather messy
and introduce problems, also in some places we added a y2038 problem of
strtol/time_t with SDE patches)
Many upstreams also dont like the concept of the SDE environment
variable that influence results - paradoxically because they consider
that less reproducible than explicit command line args or config entries.


https://rb.zq1.de/compare.factory/graph.png shows another aspect of why
we are not done yet. You see, the number of reproducible packages is
constantly increasing and I do a dozen patches each month to make
unreproducible packages reproducible, but that just keeps the number of
unreproducible packages constant around 500.


To make progress, we wanted to concentrate on core packages first, e.g.
for openSUSE, we have
https://rb.zq1.de/compare.factory-20210129/unreproduciblerings.txt that
shows for ring0 (bootstrap) just
bison
gcc10
python38

- all of them suffer from PGO [1], python also suffers from
non-determinism in .pyc files [3]

Our (SUSE) compiler guys said, the merges of counters used in .gcda
files used for PGO are non-commutative, so if you do A and then B in a
profiling run, you get different optimizations than if you first did B
and then A. Redesigning that, could improve PGO determinism in many places.


Another approach could use a variant of dettrace[2] to make
non-deterministic behaviour reproducible in build envs.
For that we would need to make working OS packages.
That would again result in a trade-off (especially for large packages
like libreoffice or gcc), because it will slow down the build and that
slows down update-cycles for users.

I also wanted to
https://trello.com/c/yKaKMjNq/67-use-dettrace-in-autoclassify
to better pin down the source of non-determinism, but any extra
contributor could do these things.

There are also some other dozen things listed in my trello board (java
and python toolchain improvements could make big impact). That could get
us beyond 98%. Maybe even to 100% verifiable for the core packages.

Once packages are verifiable, verifying is easy.


Ciao
Bernhard M.


[1] https://github.com/bmwiedemann/theunreproduciblepackage/tree/master/pgo
[2] https://github.com/dettrace/dettrace
[3] https://trello.com/c/I9voedvB/7-pyc-rb

-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 236 bytes
Desc: OpenPGP digital signature
URL: <http://lists.reproducible-builds.org/pipermail/rb-general/attachments/20210130/7a687019/attachment.sig>


More information about the rb-general mailing list