New supply-chain security tool: backseat-signed

Simon McVittie smcv at debian.org
Sat Apr 6 15:30:44 UTC 2024


On Sat, 06 Apr 2024 at 15:54:51 +0200, kpcyrd wrote:
> On 4/6/24 1:42 PM, Adrian Bunk wrote:
> > You cannot simply proclaim that some git tree is the preferred form of
> > modification without shipping said git tree in our ftp archive.
> > 
> > If your claim was true, then Debian and downstreams would be violating
> > licences like the GPL by not providing the preferred form of modification
> > in the archive.
> 
> I'm obviously not a lawyer, but I do think this is the case. Quoting from
> GPL-3.0:
> 
> > The “source code” for a work means the preferred form of the work for
> > making modifications to it. “Object code” means any non-source form of a
> > work.
> 
> autotools pre-processed source code is clearly not "the preferred form of
> the work for making modifications", which is specifically what I'm saying
> Debian shouldn't consider a "source code input" either, to eliminate this
> vector for underhanded tampering that Jia Tan has used.
> 
> If we can force a future Jia Tan to commit their backdoor into git (for
> everybody to see) I consider this a win.

I think maybe different people in this thread are talking about different
things, and talking past each other as a result. There are two questions
about what is the preferred form for modification, and I think perhaps not
everyone agrees on which question they think they're answering.

Which files are part of the source tree?
----------------------------------------

One question is: say you hand-write a file of one format (Autotools
configure.ac and *.m4) and preprocess it into another format that, while
technically editable, is not what you would genuinely edit unless you
had no alternative (the Autotools ./configure script). What is acceptable
source code for this file?

Obviously if you don't have configure.ac, then you don't have the complete
corresponding source code in the form you would want to use to make
changes; so I think the answer has to include at least configure.ac, and
there is an (IMO valid) argument that if configure.ac is missing, then what
you have does not constitute source code.

But, it is conventional for Autotools projects to ship the generated
./configure script *as well* (for example this is what `make dist`
outputs), to allow the project to be compiled on systems that do not
have the complete Autotools system installed. What we have traditionally
said is that it's legitimate for the source code of a Debian package to
include ./configure, as long as it *also* includes configure.ac.

Indeed, if upstream does ship generated files in addition to the actual
source code, we have traditionally said that Debian package maintainers
"should, except where impossible for legal reasons, preserve the entire
building and portability infrastructure provided by the upstream author"
(<https://www.debian.org/doc/manuals/developers-reference/best-pkging-practices.en.html#repackaged-upstream-source>),
It is legitimate to ask whether that rule's value exceeds its cost, or
whether the value of deleting generated files and forcing them to be
regenerated, as a "nothing up my sleeve" mechanism to make it harder
for a future Jia Tan being able to sneak malicious things in via the
`make dist` tarball, would be higher - but right now, we normally do
ship both the source and the generated file, and I'm not aware of anyone
claiming that that makes the result non-GPL-compliant.

It's also relatively common for Autotools projects' `make dist` tarballs
to omit some files that are part of the upstream git tree, such as
VCS files like .gitignore, and ancillary/non-essential files like the
configuration for Github Actions, Gitlab CI or equivalent. I think that's
a valid thing to do (as long as they are not the source code for something
in the dist tarball!) - and in fact omitting them reduces the number of
files that a packager needs to review, therefore improving our chances of
detecting the next backdoored module.

So I think you're both partly right: we should insist on having the
source code for every file we distribute as source, and in some ways it
would make review easier if we deleted all files that are not source code
(or even all files that are not required for our distro), but I don't
agree that it is *necessarily* necessary for our source code archive to
be identical to the upstream git tree.

Note that I'm using "tree" as the git jargon term here: approximately
"something that you could pack into a `git archive` tarball, losslessly".
To go beyond that, we move on to the other question I can see here:

Which commits are part of the source code?
------------------------------------------

Another question about the source code is whether it is sufficient to take
a snapshot of the current state of the git tree (again, tree as jargon term)
and say that it is the preferred form for modification, or whether complete
corresponding source code should be understood to mean its complete git
history going back to the beginning of the project (in git jargon, a series
of commits going back to one without a parent, rather than a tree).

I think that Guillem, and maybe Adrian too, whether rightly or wrongly,
understood you to be claiming that a single snapshot (git tree or `git
archive` output) is not enough, and the history is also required - and
it's that assertion, which you might not have intended to be making,
that they are pushing back most strongly against? (Or perhaps I'm
misunderstanding.)

If that's what is happening, then I agree with them.

Demanding that we ship the full history is clearly not what was meant by
the authors of the GPL. That surely can't be what the GPL was intended
to mean, because at the time it was written, public VCSs were rare, and
the GNU system was developed via a "cathedral" approach with a small
number of authors writing software privately and releasing it to the
world as a series of tarballs. It seems obvious to me that they wouldn't
have written the license to require more a comprehensive version of
"what is source?" than what they themselves were releasing.

Demanding the fully history is also not really practical for a Free
Software distribution, because a non-trivial project's history is
inconveniently large, and over a long enough timescale it's relatively
likely that someone has committed (and perhaps subsequently deleted)
something that does not qualify as Free Software - either accidentally, or
because they were assuming that it's OK to include non-Free documentation,
artwork, test data or whatever, as long as it isn't executable code
(which, rightly or wrongly, is not the position taken by Debian).

Another practical concern is that Debian already has a legal review
bottleneck: the time and effort needed for maintainers and the archive
administrators to check that the entire source release contains only Free
Software under an acceptable license is significant, and it's a major
limiting factor on how much software we can ship. If we expanded the
source release from "the source code as of today" to "all versions of the
source code up to and including today", in projects with a non-trivial
history that would dramatically increase the amount of time and effort
that needs to be spent on review. As a result of this concern, the
archive administrators have specifically disallowed the use of source
package formats that contain history: only a moment-in-time snapshot
(the equivalent of a git tree, not a series of git commits) is allowed.

    smcv


More information about the rb-general mailing list