Arguing about source inputs
fosslinux
fosslinux at aussies.space
Tue Apr 1 09:53:21 UTC 2025
Hi,
On 3/31/25 12:05, Andrius Štikonas via rb-general wrote:
> In general that is good advice but it comes at huge packaging and maintenance
> cost.
In agreement with Andrius here from experience in live-bootstrap.
> But this often results in huge cleanup functions like:
> https://github.com/fosslinux/live-bootstrap/blob/
> 2057d551e0d072f85dd3c8b046e90e6be81a3604/steps/gcc-13.3.0/pass1.sh#L5
But I note a lot of this cleanup could be avoided if one builds directly from VCS. Currently in live-bootstrap project,
we are building from tarballs for a number of reasons, effectively "emulating" a VCS build.
A few further notes;
I introduced this policy into live-bootstrap from the beginning (only building from human-auditable, and wherever
possible, human written source code), even pre the xz-attack. original motivation was that I found the status quo to be
unacceptable in the Bootstrappable Builds world.
But I think it is now necessary for general supply chain/software security to look beyond simply binary blobs, as has
been hinted, to generated code, as generated code is even easier, in my opinion, to hide malicious code in.
Particularly, it is very rare that an outside contributor will have any kind of generated code in a patch reviewed. For
instance, this is present in Gentoo's flex;
https://dev.gentoo.org/~sam/distfiles/sys-devel/flex/flex-2.6.4-autotools-regenerate.patch.xz; a 276KB compressed patch
created by an autotools regeneration. Now, in this case, this was created by a trusted Gentoo developer. But the risk is
obvious here! Suppose some "Jia Tan"-like came along and provided a similar patch - but there's a malicious line sitting
in the middle of the file. This could be really bad.
Even when using a VCS to build a package from, it is not a given that it will contain no generated code, unfortunately.
While 90% of generated code/binary blobs in the tarball don't exist in the VCS, it often doesn't remove all of them.
E.g. GCC *commits their autotools files*
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=configure;h=036142a8d06127659de5e840aaaebab9f4cc37aa;hb=HEAD
Even worse by extrapolation for this scenario is vendored code within VCS;
E.g. an import of zlib into GCC: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=0f314c78c2f6802622820248641e4f4bdf97e816
I find it difficult to understate the potential effects of an attack where malicious code is added alongside a
regeneration of generated code.
My conclusion is: not all "source", not even all human-auditable source code, is equal in trustworthiness.
The first usual step to eliminating the least trustworthy bits is moving to building from VCS.
Samuel (fosslinux)
> 2025 m. kovo 30 d., sekmadienis 22:51:23 Britanijos vasaros laikas Simon
> Josefsson via rb-general rašė:
>> "David A. Wheeler via rb-general"
>>
>> <rb-general at lists.reproducible-builds.org> writes:
>>> Based on the xz experience, the OpenSSF
>>> "Concise Guide for Developing More Secure Software"
>>> <https://best.openssf.org/Concise-Guide-for-Developing-More-Secure-Softwar
>>> e> added the following point:
>>> "If a source code (unbuilt) package is released, it should only
>>> include content from the version control system (VCS), and source
>>> package users should rebuild, if needed, to create production (built)
>>> package(s). E.g., if autotools is used, if a source package is
>>> released it should not include a generated configure file, while
>>> recipients should ignore pre-generated files like configure and
>>> instead rebuild from source (e.g., with autoreconf). This eliminates a
>>> malware-hiding mechanism, as illustrated by an attack on xz utils."
>> That is great advice! I wish that this was more widely adopted.
>>
>> I have suggested that maintainers essentially do the following when they
>> release an autotools-based projects:
>>
>> git archive --prefix=libntlm-v1.8/ -o libntlm-1.8-src.tar.gz HEAD
>> gpg -b libntlm-1.8-src.tar.gz
>>
>> With the git repository only holding configure.ac and not configure etc.
>>
>> Many distributions, Debian included, work with source tarballs that were
>> generated by "make dist" and contains generated vendored files like
>> ./configure, ./Makefile.in etc. This used to be acceptable because
>> there were only a few files, and easy to audit. However xz showed us
>> that this is a bad idea. With wide use of Gnulib and other vendor
>> libraries, this no longer scales.
>>
>> Distributions should stop building from "make dist" tarballs!
>>
>> Most normal users are helped by the vendored files included in "make
>> dist" tarballs, and they are useful for bootstrapping purposes. However
>> I believe they are a bad idea for most distributions. Over the years,
>> people slowly realized this and started to introduce broken workarounds
>> like running 'autoreconf -fi' within the tarballs, something that was
>> never intended for what it is being used for. And doesn't even do what
>> people expect it to do.
>>
>> I've recently released GNU libtasn1, libidn, libidn2, inetutils, and
>> gsasl which all were done using the recipe above. When converting the
>> Debian packaging for these packages from "make dist" to "git-archive"
>> style tarballs, several forgotten build dependencies were discovered
>> (including gperf, gengetopt) indicating that the packages never really
>> built things from source before. I think this is a widespread problem.
>>
>> I've written/talked a bit about this:
>>
>> https://blog.josefsson.org/2024/04/01/towards-reproducible-minimal-source-co
>> de-tarballs-please-welcome-src-tar-gz/
>>
>> https://blog.josefsson.org/2024/04/13/reproducible-and-minimal-source-only-t
>> arballs/
>>
>> https://debconf24.debconf.org/talks/126-de-vendor-origtargz-gnulib-and-more/
>>
>> One advantage with "git archive" releases is that they are easy to
>> reproduce by one, which avoids the boring work involved to turn "make
>> dist" tarballs reproducible:
>>
>> https://blog.josefsson.org/2025/03/24/reproducible-software-releases/
>>
>> What possibly could be missing is guidelines on what to store in source
>> control repository: if you put some generated vendored file in git (or
>> some non-free firmware blob), we are back where we started.
>>
>> /Simon
More information about the rb-general
mailing list