Research on Reproducible Builds

Omar Navarro Leija omarsa at seas.upenn.edu
Fri Mar 6 19:44:42 UTC 2020


Thank you everyone for the positive feedback!

Yes, you can now find the source code repository for Dettrace under:
https://github.com/dettrace/dettrace
(I have submitted several pull requests to polish some details and update
the README so check those out as well).

> However, it's clear that some builds may be difficult to make
reproducible through other techniques; using DetTrace to resolve many of
the remaining ones. Does that sound like a reasonable way to use DetTrace?
Yes that is one of the intended uses for Dettrace. Dettrace aims for 100%
determinism guarantee which is overkill in many cases, as Dettrace is
extremely paranoid of sources of nondeterminism. For example, we handle
process and (somewhat) thread scheduling, even though these are unlikely to
be sources of irreproducibility. An ideal system, would disallow us to
selectively turn off some of the determinism enforcement Dettrace does.
Handling Java or other managed languages fails due to our imperfect or
nonexistent treatment of timers, proper thread sequentialization, and
interprocess signals. I believe turning this off would still provide
reproducibility 99% of the time while still supporting Java and others.

> I would like to see a day where language package managers will flag as
"dangerous" any packages that cannot be reproducibly built
I'm quite a fan of Rust and its package manager: Cargo. I would love to see
such a system integrated into Cargo one day.

> It seems to me that this could also be used to run minification &
generate language-level packages - correct?
I'm afraid I'm not familiar with minification here as it relates to
language-level packages and reproducibility?

> If the "starting date" is arbitrary (like Jan 1, 1970) that would look
odd. But if the "starting date" were forcibly set to a human-reasonable
value (like the date-time of the last commit, or of the latest source
file), then it might be easier to accept the results. Has that been
considered?
Currently the date is arbitrarily set to a default, and as you pointed out
"ticks" up on every system call to give the illusion of increasing time. A
command line option in Dettrace does allow setting a user defined date:
      --epoch arg       Set system epoch (start) time. Accepts
                        `yyyy-mm-dd,HH:MM:SS` (utc). The default is
`1993-08-08,22:00:00`.
                        Also accepts a `now` value which permits
                        nondeterministically setting the initial system
time to the host
                        time.

> like the date-time of the last commit, or of the latest source file

I had not considered these personally but both would make a lot of sense
and are probably better choices than the default constant we picked, thanks!

> SOURCE_DATE_EPOCH
Admittedly, I wasn't aware of SOURCE_DATE_EPOCH, this too makes a good
target for the starting date. At the same time, Dettrace was designed to
make arbitrary computation deterministic. So while the implementation works
especially well for Debian packages, the methods are Distro and package
manager agnostic. We could take advantage of distro/package-manager
specific features like SOURCE_DATE_EPOCH for an easier user experience.

Omar

On Thu, Mar 5, 2020 at 8:57 PM Chris Lamb <lamby at debian.org> wrote:

> Hi Vagrant,
>
> > > This was curious to me too — to wit, the paper describes that the
> Debian
> > > «wheezy» distribution was being built so it was interesting to me that
> > > the first timestamp in the debian/changelog was not chosen, á la
> > > SOURCE_DATE_EPOCH.
> >
> > If I recall correctly, wheezy was chosen precisely because it had *less*
> > reproducibility fixes ... so that they could more easily tell how much
> > dettrace solved...
>
> Apologies if my previous email was in any way ambiguously phrased as it
> appears you have accidentally misparsed it.
>
> My query was not regarding why an *older* Debian distribution was
> chosen (you are correct in your analysis) but rather that given that
> *any* Debian release was chosen, the canonical timestamp was not taken
> from debian/changelog.
>
>
> Regards,
>
> --
>       ,''`.
>      : :'  :     Chris Lamb
>      `. `'`      lamby at debian.org 🍥 chris-lamb.co.uk
>        `-
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.reproducible-builds.org/pipermail/rb-general/attachments/20200306/215b6c38/attachment.htm>


More information about the rb-general mailing list