Research on Reproducible Builds

Omar Navarro Leija omarsa at seas.upenn.edu
Wed Feb 12 16:50:11 UTC 2020


Hello Julien,

I'm glad you enjoyed the work!

Your understanding of DetTrace is correct. Our container abstraction is
very lightweight in the sense that we just piggyback off Linux namespaces +
chroot to provide isolation. Currently, to provide reproducibility, someone
using DetTrace should download a chroot image (e.g. via debootstrap) and
use this as the canonical filesystem image to use for the build. This is a
bit clunky, and I believe it is not a 100% satisfactory solution. I don't
know of any other ways to "normalize" the filesystem environment though.

This may seem a little heavy handed, so I'm curious how Guix handles a
build process that tries to read arbitrary filessystem data? I'm reading
more about Guix now, so I'll have smarter things to say about it later
(hopefully).

For Dettrace we set out to see if it was feasible to create a 100%
(foolproof) dynamic determinism enforcement system. I believe we succeeded
at this goal (modulo some CPU instructions). However, I don't believe the
full-proof solution is necessary or practical (tangent: our solution
attempts to be foolproof for mostly academic reasons, not practical
concerns about solving real problems, this is part of the fun of being in
academia).

The point being: we attempt to sequentialize execution of threads, this is
extremely difficult, and we can't do it properly. The biggest sources of
unsupported packages (more details in the paper!) are Java, sockets, and
intra-process signals. Java always ends up deadlocking due to our attempts
to sequentialize thread execution in the JVM. With our current methods I
don't think this can ever work properly.

Not all is lost though: I don't expect package builds to be
nondeterministic from thread scheduling, sockets, or signals. So the simple
solution is just to allow these things to happen in DetTrace and call it
good enough. We still get all the other benefits of DetTrace but relax the
paranoia and thus allow a wider set of packages to build. DetTrace could
certainly be modified to support this.

I don't have any immediate plans to improve this, but would certainly not
be against it either. I like to think the biggest contribution of DetTrace
toward the reproducible builds effort is the ideas and methods, rather than
the implementation.

I'll definitely let you know when it is available, the implementation is
not as robust as it could be. So I want to set expectations accordingly!

Omar

On Tue, Feb 11, 2020 at 3:37 PM Julien Lepiller <julien at lepiller.eu> wrote:

> Le 11 février 2020 05:54:13 GMT-05:00, Chris Lamb <
> chris at reproducible-builds.org> a écrit :
> >Dear all,
> >
> >> Ugh, sorry about that!
> >>
> >> It should work now!
> >
> >Thanks; works for me… as you can see here:
> >
> >   https://i.imgur.com/UEqbiR4.png
> >
> >
> >Best wishes,
> >
> >--
> >      o
> >    ⬋   ⬊      Chris Lamb
> >   o     o     reproducible-builds.org
> >    ⬊   ⬋
> >      o
> >_______________________________________________
> >rb-general at lists.reproducible-builds.org mailing list
> >
> >To change your subscription options, visit
> >https://lists.reproducible-builds.org/listinfo/rb-general.
> >
> >To unsubscribe, send an email to
> >rb-general-unsubscribe at lists.reproducible-builds.org.
>
> Hi Omar,
>
> I was able to download and read the paper after all. It was an interesting
> read, thanks for posting it here :). If I understand correctly you and your
> co-authors built a tool, DetTrace, that implements a container technology
> similar to docker, but whose conception allows for the deterministic
> behavior of its content. While I have my doubts on containers in general,
> the introduction resonnated strongly with the guix user inside me :). I
> think your tool could be seen as a way to transform a process with
> side-effects into a pure function!
>
> In Guix (and Nix), treating a build procedure as a pure function is the
> core of the packaging model we use. In fact we already use a similar
> technology to remove most of the non determinism sources when building
> (user names and ids, filesystem paths and content, network). I especially
> liked section 5 with the list of non determinism sources, since it shows
> where our own tooling is lacking. In fact, I'm pretty sure we could use
> DetTrace in Guix without too much work, and finally get to 100%
> reproducibility.
>
> Your evaluation section though suggests that it's not yet practical to use
> DetTrace for building an entire distribution as ~25% of packages couldn't
> be built using DetTrace. Do you think it would be a lot of work to get to
> 0%? Do you plan to improve it?
>
> Let me know when your tool is available, as I'd like to experiment with it
> :).
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.reproducible-builds.org/pipermail/rb-general/attachments/20200212/9f456ef1/attachment.htm>


More information about the rb-general mailing list