[rb-general] Reproducible builds and distributed CI

Sun Aug 11 09:06:41 UTC 2019

Thank you for you thoughtful feedback!

I've been pondering this topic again. It's important to me (meaning
that this is part of my main hobby project), but due to life reasons,
its not an urgent one for me.

I came up with the initial threat modelling below. Feedback on that is
welcome, and I hope it can be of use for others who are thinking about
the same things. My conclusion after discussions here, elsewhere, and
my own thinking and research, is that "distributed CI" is plausible,
and that reproducible builds would be an important building block for
it, but that there's a lot of other details to get right, too. (Which
is not a blocker, but makes it a more interesting problem to solve.)

The two use cases for distributed CI I'm thinking of are:

* providing a massive CI build farm for free software development, by
  recruiting thousands of people to donate worker time

* enabling companies to make use of spare capacity in their employees'
  work computers for CI; some companies already do this, but it's a
  little awkward

Threat modelling for distributed CI
=============================================================================

* outline of system
    * version control system hold source code
    * IDP authenticates and authorizes users, system components
    * controller co-ordinates builds, collects build logs
    * artifact store holds build artifacts
    * workers (many) do the actual building, are told by controller
      what to do, fetch source from version control system, upload
      artifacts to artifact store

* entitites in the system that need to be protected:
    * the person using CI
    * the person running the IDP, controller, and artifact store (for
      simplicity, assume they're all run by the same person, although
      they could each be run by separate people)
    * the people running runners

* threats to person using CI
    * malicious workers, which embeds unwanted code in build artifact
        * mitigation: use reproducible builds and build on at least
          two workers to detect unwanted changes in artifacts; this
          would work OK, if there are relatively few malicious workers
    * many malicious workers, or workers that become malicious after a
      long period of working fine
        * mitigation: have at least one trusted worker, which might be
          slow, but whose output is required for a build to be trusted
            * artifacts from maybe-trusted workers can't be used for
              deployment, but could be used with sufficient isolation
              to speed things up, e.g., to do heavy testing: if the
              trusted worker later confirms the binaries are
              trustworthy (bitwise identical), then the test results
              can be trusted, too
        * variant of mitigation: require at least N maybe-trusted
          workers to produce bitwise identical build artifacts, where
          N is set by the person running the CI or whose project is
          being built
        * rejected: a karma or reputation system based on past
          behaviour: this makes long-lived workers valuable targets,
          and years of good behaviour won't protect if the worker gets
          hijacked

* threats to person running IDP, controller, artifact store
    * there are no new threats to these that come due to the
      distributed nature of CI
    * all the usual threats apply, of course

* threats to those running workers
    * build uses too much CPU or RAM
        * mitigation: enable person running worker to set limits and
          priorities so that the build doesn't use resources needed by
          other things
    * build attacks remote hosts (e.g., DDoS)
        * mitigation: prevent build from accessing any network hosts,
          except version control server, controller, artifact store
    * build attacks host where worker runs
        * mitigation: run build in a VM, using the best avilable
          isolation techniques, such as carefully configured qemu/KVM
          to implement the VM, and keeping all related software up to
          date

On Wed, Jun 19, 2019 at 01:24:10PM +0200, Bernhard M. Wiedemann wrote:
> I share Arnout's concerns.
> 
> With openSUSE we have the https://openbuildservice.org/  (OBS)
> and there I had previously entertained similar thoughts.
> 
> 
> On 19/06/2019 12.50, Arnout Engelen wrote:
> > On Wed, Jun 19, 2019 at 12:29 PM Lars Wirzenius <liw at liw.fi> wrote:
> >> On Sun, May 19, 2019 at 01:09:40PM +0300, Lars Wirzenius wrote:
> >> * Is the approach of at-least-N bitwise identical builds sensible,
> >>   assuming sufficient build workers being available? Or are there
> >>   security aspects and risks there that I am missing?
> > 
> > This is indeed an aspect that needs thought here. In its simplest
> > implementation, where anyone can freely join the builder pool,
> > this will obviously not work: an attacker could start a ton of build
> > nodes (buying them, using a botnet, ...), and inject its malware
> > when it controls at-least-N of the build nodes.
> > 
> > A "trust but verify" approach where you put your reputation on the
> > line when providing build nodes (or get penalized in some other way
> > when foul play is detected) could perhaps work.
> > 
> > Perhaps there are other creative mitigations?
> 
> If you want to be sure, then among the N there needs to be at least 1
> that you trust. In a way, that is similar to the Debian and openSUSE
> model where you have one trusted official build and rebuilders that
> verify it.
> Or you do it airport-security style of randomly checking 10% so anyone
> doing anything malicious has a *risk* of being detected and penalized.
> 
> I guess, some trust-karma system could help there to reduce risk.
> It assumes that those builders with a long track record of producing
> correct builds are more likely to do the next build correctly.
> And if it is hacked and produces malicious results, there are still
> others that hopefully produce another result.
> You still need a good way to handle disagreement - e.g. just because N-1
> produce one result, it does not have to mean that the other one is wrong.
> 
> 
> > Another attack vector you should think about is how to isolate the
> > build itself: it'd be bad if someone could hack the build nodes by
> > submitting a malicious build. I bet there's prior art on this though,
> > as this is something basically all PAAS providers have to deal with
> > somehow.
> 
> In OBS, build workers use qemu/KVM to not have to trust the software
> they build. You still have to ensure to keep it updated because there
> sometimes were kvm bugs (e.g. in emulated floppy controller) that
> allowed code to break out.
> 
> > On the one
> >   hand, how difficult is it to build something reproducibly?
> 
> Not that hard. You just have to avoid the 10 sources of non-determinism
> that I collected in
> https://github.com/bmwiedemann/theunreproduciblepackage
> 
> Most hello world programs have a reproducible build.
> Most huge software collections (Firefox, Libreoffice, python3, openjdk)
> have issues.
> _______________________________________________
> rb-general at lists.reproducible-builds.org mailing list
> 
> To change your subscription options, visit https://lists.reproducible-builds.org/listinfo/rb-general.
> 
> To unsubscribe, send an email to rb-general-unsubscribe at lists.reproducible-builds.org.

-- 
I want to build worthwhile things that might last. --joeyh
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.reproducible-builds.org/pipermail/rb-general/attachments/20190811/69b88121/attachment.sig>