Arguing about source inputs

Simon Josefsson simon at josefsson.org
Tue Apr 1 10:09:59 UTC 2025


+1 on building from git/VC sources instead of 'make dist' tarballs with
generated/vendored content.

fosslinux via rb-general <rb-general at lists.reproducible-builds.org>
writes:

> But I think it is now necessary for general supply chain/software
> security to look beyond simply binary blobs, as has been hinted, to
> generated code, as generated code is even easier, in my opinion, to
> hide malicious code in. Particularly, it is very rare that an outside
> contributor will have any kind of generated code in a patch
> reviewed. For instance, this is present in Gentoo's flex;
> https://dev.gentoo.org/~sam/distfiles/sys-devel/flex/flex-2.6.4-autotools-regenerate.patch.xz;
> a 276KB compressed patch created by an autotools regeneration. Now, in
> this case, this was created by a trusted Gentoo developer. But the
> risk is obvious here! Suppose some "Jia Tan"-like came along and
> provided a similar patch - but there's a malicious line sitting in the
> middle of the file. This could be really bad.

Indeed.  I think people will be surprised how often generated/vendored
code is modified/patched in practice.  I've done this in many upstream
projects for years, both intentionally (e.g., to fix libtool.m4 bugs)
and unintentionally (e.g., by using debian's autotools packages which
produce different scripts than official upstream autotool releases), and
I think this habit is common.

> Even when using a VCS to build a package from, it is not a given that
> it will contain no generated code, unfortunately. While 90% of
> generated code/binary blobs in the tarball don't exist in the VCS, it
> often doesn't remove all of them.
>
> E.g. GCC *commits their autotools files*
> https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=configure;h=036142a8d06127659de5e840aaaebab9f4cc37aa;hb=HEAD

OpenSSH do this too: on release branches they commit the generated
./configure etc files.

I don't think we can convince everyone to stop do this.  I do it too in
several upstream projects.

> My conclusion is: not all "source", not even all human-auditable
> source code, is equal in trustworthiness.
>
> The first usual step to eliminating the least trustworthy bits is
> moving to building from VCS.

Yes.  We need more, though: there needs to be a pruning-step after
cloning a repository to remove known-generated/vendored files before
use.  I don't think the Debian approch to include these files but to
re-generate the generated content is sustainable: it is too easy to make
mistakes ending up using the pre-generated content instead.  Pruning the
files, and auditing the pruned copy, would help.

/Simon
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 1251 bytes
Desc: not available
URL: <http://lists.reproducible-builds.org/pipermail/rb-general/attachments/20250401/e012998f/attachment.sig>


More information about the rb-general mailing list