Making reproducible builds & GitBOM work together in spite of low-level component variation

David A. Wheeler dwheeler at dwheeler.com
Fri Jun 24 20:46:55 UTC 2022



> On Jun 22, 2022, at 2:28 PM, Vagrant Cascadian <vagrant at reproducible-builds.org> wrote:
> 
> In my previous reply, I somehow glazed over the fact that the ADG and
> GitBOM identifier are embedded in the artifacts at build time...
> 
> I can see the value in embedding provenence information in the build
> artifacts, but that makes reproducible builds considerably harder to
> achieve if it is recording *everything* about the build environment.

Correct, that's my fundamental cause of my concern. They don't currently
plan to record command line parameters, so it's not quite "everything", but
they do intend to record hashes of every input file. So it's quite a lot.

If the GitBOM information was only stored in *external* objects, and later
GitBOM id's only referred to contents and did *NOT* include GitBOM information,
then I don't think there's an issue. Indeed, that is one solution to
enabling both GitBOM & reproducible builds:
Store & process GitBOMs *outside* the generated executable files like ELF
files (just like .buildinfo & other metadata files).

Your point about "making reproducible builds considerably harder to achieve" is
fair, and though I didn't emphasize it in my earlier post, is a related concern.
Currently, if builds generate different intermediate files, but end up with the same
final results, the results are still considered reproducible. With GitBOM this is NOT true.
If there's an intermediate file that doesn't generate the same contents, then the
GitBOM for it will be different, and all later GitBOMs transitively will be different,
making the produced final executable different. You could argue that might be better
long-term, because it reveals where reproducibility doesn't occur on intermediate steps.
But it will probably make many reproducible builds harder to achieve.


On Jun 22, 2022, at 4:09 PM, Marek Marczykowski-Górecki <marmarek at invisiblethingslab.com> wrote:

> IIUC, it isn't really about running kernel, but about kernel headers
installed in the system as a definition of kernel ABI. IOW
linux-libc-dev vs linux-headers-* package in Debian.
It is still not ideal, but much less of an issue than really "running
kernel".

Fair enough. Let's use Debian as an example. The "typical"
way I've seen Linux kernel headers installed would be by running:

> sudo apt install linux-headers-$(uname -r)

This command would *NOT* work any more with reproducible builds if GitBOM is used
and the kernel is updated. Even if the headers don't change the resulting
*executable* code, the GitBOM hashes would. be recorded in the resulting
compiled objects (e.g., ELF files), and they would be *different*. What's more,
since the GitBOMs are transitive, all the generated executables would be transitively different.

The solution is either to run on the same old kernel (e.g., in a VM), or to install
the linux-headers-VERSION for the build being reproduced (NOT for the actual running kernel).
The latter *does* work fine for a container (as I noted earlier).

Again, this is informed speculation on my part. I'm trying to anticipate a potential
problem so that GitBOM & reproducible builds can work together. If anyone has better ideas,
or can show that my concerns aren't real, I'd love to know.

Thanks!

--- David A. Wheeler



More information about the rb-general mailing list