Making reproducible builds & GitBOM work together in spite of low-level component variation

Marek Marczykowski-Górecki marmarek at invisiblethingslab.com
Wed Jun 22 21:09:02 UTC 2022


On Wed, Jun 22, 2022 at 11:43:49AM -0700, Vagrant Cascadian wrote:
> On 2022-06-22, David A. Wheeler wrote:
> > The challenge is that I believe that there will be subtle variations in inputs caused by
> > very low-level components, particularly kernels & but also potentially also low-level
> > runtimes like the C runtime. This could result it irreproducibility of anything with GitBOMs
> > if the whole process is applied without some corrective factor.
> >
> > I'm going to use the Linux kernel as an example here. That said,
> > I suspect the issue is broader (it would at least apply to any kernel).
> >
> > Programs running on a Linux kernel eventually must call the kernel.
> > To support this, the Linux kernel provides a mechanism to export its API. See
> > "exporting kernel headers" here:
> > https://docs.kernel.org/kbuild/headers_install.html#:~:text=The%20linux%20kernel's%20exported%20header,used%20with%20these%20system%20calls.
> >
> > These header files are either used directly by programs to call to the kernel,
> > or are processed & converted into other files that end up getting embedded in
> > intermediate runtimes (typically the C runtime).
> >
> > But here's the thing: kernel header files change on basically every release,
> > e.g., to add new system calls or new flags. In practically all cases these changes
> > don't change the result of executing a build, and thus don't currently interfere
> > with reproducible builds. If GitBOM data is added, however, this variance will
> > cause different hashes to be included, causing all build results (transitively) to be
> > different when you use an even *slightly* different kernel version.
> 
> > POTENTIAL SOLUTIONS
> >
> > Here are some potential solutions I can see:
> >
> > 1. For reproducible builds, rebuild on *EXACTLY* the same kernel version, C library, etc.
> >   This means that you can't just use containers to control rebuilds, since typically containers are
> >   designed to be able to run on arbitrary kernels & people normally upgrade containers.
> >   You'll need to build on whole new VMs with specifically-configured kernels,
> >   *NOT* just embed this in containers. You also need to record exactly which
> >   kernel was used to compiler it.
> 
> This seems more relevent for the way GitBOM records provenance
> information than it does for achieving a reproducible build.
> 
> Kernel version differences are tested on Debian's 31k+ packages:
> 
>   https://tests.reproducible-builds.org/debian/index_variations.html
> 
> Most of the reproducibility issues I've encountered seem to be embedding
> the kernel version, not header data. 
> 
> I don't recall off the top of my head how many packages have been
> manually fixed, but the remaining packages in debian that are affected
> by kernel version differences amount to about 30 packages out of 31k+
> total:
> 
>   https://tests.reproducible-builds.org/debian/issues/bookworm/captures_kernel_version_issue.html
>   https://tests.reproducible-builds.org/debian/issues/bookworm/captures_kernel_version_via_CMAKE_SYSTEM_issue.html
> 
> So from a Reproducible Builds pespective, the running kernel should not
> really matter... and that is a good thing!

IIUC, it isn't really about running kernel, but about kernel headers
installed in the system as a definition of kernel ABI. IOW
linux-libc-dev vs linux-headers-* package in Debian.
It is still not ideal, but much less of an issue than really "running
kernel".

> > 2. For reproducible builds, redirect header file content requests so they use the same
> >    header files, etc., as the original build. GitBOM doesn't care what the underlying kernel
> >    version is really, it's just recording the inputs *used*. This means containers can
> >    once again be used, even when the kernel changes, but it does complicate
> >    performing reproducible builds.
> 
> Feels a bit unclean... either you record what you care about honestly,
> or you decide you don't care about it and don't record it. I think the
> key is the transparency about the process.

There are cases where those headers actually do influence the build
output, not only in very low level software. Some time ago, I had a case
where an application preferably should use O_TMPFILE, but also needed an
alternative path if it isn't available (Debian jessie?).

I think redirecting kernel headers may be useful not only for GitBOM
case, but more generally for reproducibility in some cases too.

That said, the OP suggested it may be not just about kernel headers
but some others too, and I'm not sure if the same reasoning will apply.

> > 3. Have compiler flags/configurations to *omit* certain files from the GitBOM results.
> >   After all, you're not actually *including* the kernel in the generated results, so it makes
> >   sense to omit those files from the point of view of "what is being included in this application"?
> >   Ed Warnicke hates this idea, because it creates a "blind spot" in GitBOM.
> 
> Yeah, I can see why someone would not like this approach.
> 
> 
> > 4. Tweak the definition of reproducible builds so that it's a bit-by-bit identical
> > copy of a specified artifact, but the artifact can be *part* of a file.
> 
> In practice, there are a few cases where this is done, e.g. .apk and
> .rpm files embed signatures which need to be stripped out for
> reproducibility comparison.
> 
> Excluding some bits and verifying the rest adds complication to the
> verification process, and thus opportunities for errors, and I believe
> at least once resulted in incorrect results due to bugs in the
> verification process...

Another issue with this approach is embedding one artifact in another.
If, for example an ELF binary (with GitBOM note included) is then
included in some container (archive, filesystem image - like live ISO
image), then the comparing process gets _much_ more complex.


-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.reproducible-builds.org/pipermail/rb-general/attachments/20220622/9a13f99e/attachment.sig>


More information about the rb-general mailing list