Nearly reproducible Bookworm 12.6 live images

James Addison jay at jp-hosting.net
Sun Aug 11 10:37:28 UTC 2024


Thanks Roland - I've found some time to read the relevant rebuild.sh script
from the Debian live-build.git repository, to reassure myself about the
implementation you've described.

Generally it does seem reliable to me - I'll add one or two comments inline
below:

On Tue, 16 Jul 2024 at 08:58, Roland Clobus <rclobus at rclobus.nl> wrote:
> [ ... snip ... ]
> On 16/07/2024 00:21, James Addison wrote:
> > This sounds great - I have one specific question / concern:
>
> > Limiting the timestamp to a maximum of SOURCE_DATE_EPOCH makes sense so that
> > the modification-time is consistent for equivalent rebuilds.
> >
> > However: if the contents of the txt file / other content in the git repository
> > change, would that prevent an independent rebuilder from recreating the
> > identical CD image as output?
>
> First the rebuild script determines the latest modification date of the
> archive (as found in InRelease) and sets SOURCE_DATE_EPOCH accordingly.
> Then the rebuild script uses the git repository of live-build and finds
> the commit that matches SOURCE_DATE_EPOCH. Thus it ensures that for the
> Debian point releases you get exactly what was configured at that time.
> During this 'git checkout HASH' command, the timestamps of the modified
> files will be updated to 'now', which is the intended behaviour, as the
> final ISO image will have all dates truncated to SOURCE_DATE_EPOCH.
>
> So an independent rebuilder should call the rebuild.sh script without
> setting S_D_E and will get identical output.

>From the script contents: yes, this is accurate.  A rebuilder that builds from
the same InRelease version of the archive (deb.debian.org), or, subsequently,
from a snapshot of the archive, will (unless configured to use a local copy)
retrieve live-build.git and check out the most recent single commit found
_after_ the release timestamp that is reachable from the HEAD commit ref.

A nitpick: should we clone a specific named branch to retrieve the HEAD
reference?  Rationale: the default branch name for a repo can change.


Initially I was worried about the possibility that subsequent git commits with
amended dates could break reproducibility, but on inspection, although git
commit _author date_ can be modified and adjusted by commits, the min-age
filter used by the rebuild.sh script filters by _commit date_, which cannot be
amended or overridden by subsequently pushed commits.


The general angle that I'm considering here is: what resources should rebuilder
sites download in advance if they want to perform a rebuild, and how can they
reasonably assure that they are using the same sources as others, and that
those will not diverge at a later date?

> > (note: given that the package archive is regularly updated, I think that there
> > are other reasons why an identical point-in-time rebuild may require sharing
> > of some build-time/archive metadata -- the reason I'm asking is that I'd like
> > to check whether this git repo could require similar co-ordination during
> > rebuilds.  fixed commit ID / signed tag, or other mechanism for example)
>
> I'm using the timestamp from InRelease, which was confirmed to be 'the'
> timestamp of the archive [1]. Therefore I don't need tags.

Provided that sites receive the same live-build.git checkout (and they should),
I agree with this.  It may seem like I'm over-challenging the design but I
would like to determine whether we could provide any additional guarantees that
the install media and live-build correspond to each other (perhaps it is not
possible, there may be a circular dependency problem).

(that may be steering back towards a Debian-specific conversation, but some
of this source-trust-consensus is no doubt relevant to other distros also)

Thanks again,
James


More information about the rb-general mailing list