Research on Reproducible Builds

Omar Navarro Leija omarsa at seas.upenn.edu
Tue Mar 17 13:55:27 UTC 2020


To follow up.

Here is the 20 minute presentation of our Reproducible Containers project.
It is a lot more bite-size than the paper :3

https://youtu.be/NuDIZqBdM08

On Fri, Mar 6, 2020 at 2:44 PM Omar Navarro Leija <omarsa at seas.upenn.edu>
wrote:

> Thank you everyone for the positive feedback!
>
> Yes, you can now find the source code repository for Dettrace under:
> https://github.com/dettrace/dettrace
> (I have submitted several pull requests to polish some details and update
> the README so check those out as well).
>
> > However, it's clear that some builds may be difficult to make
> reproducible through other techniques; using DetTrace to resolve many of
> the remaining ones. Does that sound like a reasonable way to use DetTrace?
> Yes that is one of the intended uses for Dettrace. Dettrace aims for 100%
> determinism guarantee which is overkill in many cases, as Dettrace is
> extremely paranoid of sources of nondeterminism. For example, we handle
> process and (somewhat) thread scheduling, even though these are unlikely to
> be sources of irreproducibility. An ideal system, would disallow us to
> selectively turn off some of the determinism enforcement Dettrace does.
> Handling Java or other managed languages fails due to our imperfect or
> nonexistent treatment of timers, proper thread sequentialization, and
> interprocess signals. I believe turning this off would still provide
> reproducibility 99% of the time while still supporting Java and others.
>
> > I would like to see a day where language package managers will flag as
> "dangerous" any packages that cannot be reproducibly built
> I'm quite a fan of Rust and its package manager: Cargo. I would love to
> see such a system integrated into Cargo one day.
>
> > It seems to me that this could also be used to run minification &
> generate language-level packages - correct?
> I'm afraid I'm not familiar with minification here as it relates to
> language-level packages and reproducibility?
>
> > If the "starting date" is arbitrary (like Jan 1, 1970) that would look
> odd. But if the "starting date" were forcibly set to a human-reasonable
> value (like the date-time of the last commit, or of the latest source
> file), then it might be easier to accept the results. Has that been
> considered?
> Currently the date is arbitrarily set to a default, and as you pointed out
> "ticks" up on every system call to give the illusion of increasing time. A
> command line option in Dettrace does allow setting a user defined date:
>       --epoch arg       Set system epoch (start) time. Accepts
>                         `yyyy-mm-dd,HH:MM:SS` (utc). The default is
> `1993-08-08,22:00:00`.
>                         Also accepts a `now` value which permits
>                         nondeterministically setting the initial system
> time to the host
>                         time.
>
> > like the date-time of the last commit, or of the latest source file
>
> I had not considered these personally but both would make a lot of sense
> and are probably better choices than the default constant we picked, thanks!
>
> > SOURCE_DATE_EPOCH
> Admittedly, I wasn't aware of SOURCE_DATE_EPOCH, this too makes a good
> target for the starting date. At the same time, Dettrace was designed to
> make arbitrary computation deterministic. So while the implementation works
> especially well for Debian packages, the methods are Distro and package
> manager agnostic. We could take advantage of distro/package-manager
> specific features like SOURCE_DATE_EPOCH for an easier user experience.
>
> Omar
>
> On Thu, Mar 5, 2020 at 8:57 PM Chris Lamb <lamby at debian.org> wrote:
>
>> Hi Vagrant,
>>
>> > > This was curious to me too — to wit, the paper describes that the
>> Debian
>> > > «wheezy» distribution was being built so it was interesting to me that
>> > > the first timestamp in the debian/changelog was not chosen, á la
>> > > SOURCE_DATE_EPOCH.
>> >
>> > If I recall correctly, wheezy was chosen precisely because it had *less*
>> > reproducibility fixes ... so that they could more easily tell how much
>> > dettrace solved...
>>
>> Apologies if my previous email was in any way ambiguously phrased as it
>> appears you have accidentally misparsed it.
>>
>> My query was not regarding why an *older* Debian distribution was
>> chosen (you are correct in your analysis) but rather that given that
>> *any* Debian release was chosen, the canonical timestamp was not taken
>> from debian/changelog.
>>
>>
>> Regards,
>>
>> --
>>       ,''`.
>>      : :'  :     Chris Lamb
>>      `. `'`      lamby at debian.org 🍥 chris-lamb.co.uk
>>        `-
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.reproducible-builds.org/pipermail/rb-general/attachments/20200317/14a1403d/attachment.htm>


More information about the rb-general mailing list