[rb-general] Reproducible builds and distributed CI

Bernhard M. Wiedemann bernhardout at lsmod.de
Wed Jun 19 11:24:10 UTC 2019


I share Arnout's concerns.

With openSUSE we have the https://openbuildservice.org/  (OBS)
and there I had previously entertained similar thoughts.


On 19/06/2019 12.50, Arnout Engelen wrote:
> On Wed, Jun 19, 2019 at 12:29 PM Lars Wirzenius <liw at liw.fi> wrote:
>> On Sun, May 19, 2019 at 01:09:40PM +0300, Lars Wirzenius wrote:
>> * Is the approach of at-least-N bitwise identical builds sensible,
>>   assuming sufficient build workers being available? Or are there
>>   security aspects and risks there that I am missing?
> 
> This is indeed an aspect that needs thought here. In its simplest
> implementation, where anyone can freely join the builder pool,
> this will obviously not work: an attacker could start a ton of build
> nodes (buying them, using a botnet, ...), and inject its malware
> when it controls at-least-N of the build nodes.
> 
> A "trust but verify" approach where you put your reputation on the
> line when providing build nodes (or get penalized in some other way
> when foul play is detected) could perhaps work.
> 
> Perhaps there are other creative mitigations?

If you want to be sure, then among the N there needs to be at least 1
that you trust. In a way, that is similar to the Debian and openSUSE
model where you have one trusted official build and rebuilders that
verify it.
Or you do it airport-security style of randomly checking 10% so anyone
doing anything malicious has a *risk* of being detected and penalized.

I guess, some trust-karma system could help there to reduce risk.
It assumes that those builders with a long track record of producing
correct builds are more likely to do the next build correctly.
And if it is hacked and produces malicious results, there are still
others that hopefully produce another result.
You still need a good way to handle disagreement - e.g. just because N-1
produce one result, it does not have to mean that the other one is wrong.


> Another attack vector you should think about is how to isolate the
> build itself: it'd be bad if someone could hack the build nodes by
> submitting a malicious build. I bet there's prior art on this though,
> as this is something basically all PAAS providers have to deal with
> somehow.

In OBS, build workers use qemu/KVM to not have to trust the software
they build. You still have to ensure to keep it updated because there
sometimes were kvm bugs (e.g. in emulated floppy controller) that
allowed code to break out.

> On the one
>   hand, how difficult is it to build something reproducibly?

Not that hard. You just have to avoid the 10 sources of non-determinism
that I collected in
https://github.com/bmwiedemann/theunreproduciblepackage

Most hello world programs have a reproducible build.
Most huge software collections (Firefox, Libreoffice, python3, openjdk)
have issues.


More information about the rb-general mailing list