New supply-chain security tool: backseat-signed

kpcyrd kpcyrd at archlinux.org
Thu Apr 4 19:39:51 UTC 2024


On 4/3/24 4:21 AM, Adrian Bunk wrote:
> On Wed, Apr 03, 2024 at 02:31:11AM +0200, kpcyrd wrote:
>> ...
>> I figured out a somewhat straight-forward way to check if a given `git
>> archive` output is cryptographically claimed to be the source input of a
>> given binary package in either Arch Linux or Debian (or both).
> 
> For Debian the proper approach would be to copy Checksums-Sha256 for the
> source package to the buildinfo file, and there is nothing where it would
> matter whether the tarball was generated from git or otherwise.
> 
>> I believe this to be the "reproducible source tarball" thing some people
>> have been asking about.
>> ...
> 
> The lack of a reliably reproducible checksum when using "git archive" is
> the problem, and git cannot realistically provide that.
> 
> Even when called with the same parameters, "git archive" executed in
> different environments might produce different archives for the same
> commit ID.
> 
> It is documented that auto-generated Github tarballs for the same tag
> and with the same commit ID downloaded at different times might have
> different checksums.

Granted it takes some skill to take snapshots that match what github is 
generating (and there are occasional issues) but generally speaking it 
works quite well. The required command is in the README, and I encourage 
you to give it a try.

If you want something that's explicitly designed for taking reproducible 
VCS snapshots you could also consider the "Nix Archive" format[0], 
however I think more people would be in favor of agreeing on how to 
canonically derive a given git tree into a `.tar.gz` (or at least .tar) 
instead of switching Debian to the .nar file format.

[0]: https://github.com/ebkalderon/libnar

I think regular `git archive` is already pretty good, complaining that 
it may only work in 98% of cases, I'd say, is a Luxusproblem considering 
the current state of things. The next paragraph is the bigger headache:

>> This tool highlights the concept of "canonical sources", which is supposed
>> to give guidance on what to code review.
>> ...
> 
> How does it tell the git commit ID the tarball was generated from?
> 
> Doing a code review of git sources as tarball would would be stupid,
> you really want the git metadata that usually shows when, why and by
> whom something was changed.

It doesn't. It works like a one-way function, it can verify a given VCS 
snapshot is definitely the source code that was ingested into Debian, 
but it can't locate the source code on its own.

I don't know if Debian has this kind of provenance information 
available, to my knowledge, Debian operates on "our maintainers upload 
.tar.xz files into our archive and we take them for face value". Which 
does make sense, considering not every software project uses git, some 
may develop their own VCS, some software projects do not have any VCS at 
all and it's just one person applying patches to a folder on their local 
computer and uploading .tar snapshots to a webserver every other month.

There's some packages that have some kind of system behind them, like 
rust-toml_0.5.11.orig.tar.gz in the Debian Archive can be expected to 
match <https://crates.io/api/v1/crates/toml/0.5.11/download> (although 
sometimes files get excluded from the tar upload). I'd like to 
explicitly encourage people to point me in the right direction if 
there's any existing effort of mapping debian .orig.tar.gz files to git 
tags (not necessarily bit-for-bit, but at least which commit we expect 
it to come from).

>> https://github.com/kpcyrd/backseat-signed
>>
>> The README
>> ...
> 
> "This requires some squinting since in Debian the source tarball is
>   commonly recompressed so only the inner .tar is compared"
> 
> This doesn't sound true.

I've updated the wording and intend to investigate this further. By 
default the relevant command even expects an exact match. For example 
this works:

```
% backseat-signed plumbing debian-tarball-from-sources --sources 
Sources.xz --name cmatrix cmatrix_2.0.orig.tar.gz
[2024-04-04T18:45:09Z INFO  backseat_signed::plumbing] Loading sources 
index from "Sources.xz"
[2024-04-04T18:45:10Z INFO  backseat_signed::plumbing] Loading file from 
"cmatrix_2.0.orig.tar.gz"
[2024-04-04T18:45:10Z INFO  backseat_signed::plumbing] Searching in index...
[2024-04-04T18:45:10Z INFO  backseat_signed::plumbing] File verified 
successfully
```

But if I repack the .tar.gz into .tar.xz it's going to get rejected:

```
% backseat-signed plumbing debian-tarball-from-sources --sources 
Sources.xz --name cmatrix cmatrix_2.0.orig.tar.xz
[2024-04-04T18:48:32Z INFO  backseat_signed::plumbing] Loading sources 
index from "Sources.xz"
[2024-04-04T18:48:33Z INFO  backseat_signed::plumbing] Loading file from 
"cmatrix_2.0.orig.tar.xz"
[2024-04-04T18:48:33Z INFO  backseat_signed::plumbing] Searching in index...
Error: Could not find source tarball with matching hash in source index
```

Being able to disregard the compression layer is still necessary 
however, because Debian (as far as I know) never takes the hash of the 
inner .tar file but only the compressed one. Because of this you may 
still need to provide `--orig <path>` if you want to compare with an 
uncompressed tar.

Here's an example of how you'd verify vim_9.1.0199.orig.tar.xz in Debian 
was taken from `https://github.com/vim/vim#tag=v9.1.0199`:

```
% git clone --branch v9.1.0199 https://github.com/vim/vim
% git -C vim rev-parse HEAD
ad38769030b5fa86aa0e8f1f0b4266690dfad4c9
% git -C vim archive --prefix="vim-9.1.0199/" -o vim-9.1.0199.tar v9.1.0199
% sha256sum vim-9.1.0199.tar
166f319a31a4eada3d181d80780f8581b11cf6fac61e57e73ef26a1e183eaed0 
vim-9.1.0199.tar
```

Take Sources.xz from here:

https://snapshot.debian.org/archive/debian/20240324T210425Z/dists/sid/main/source/Sources.xz

sha256:ba14ca35563ace9dc1e81446f6d72979cdc5aa7ea5c558cb0fe5071736c602b2

And vim_9.1.0199.orig.tar.xz from here:

https://snapshot.debian.org/archive/debian/20240324T210425Z/pool/main/v/vim/vim_9.1.0199.orig.tar.xz

sha256:a3284e44b55a7877f3b0bbb1b0a349748e3b48f9d1e1c9d0f93856f7be417dda

You can verify it all checks out like this:

```
% backseat-signed plumbing debian-tarball-from-sources --sources 
Sources.xz --orig vim_9.1.0199.orig.tar.xz --name vim vim-9.1.0199.tar
[2024-04-04T19:09:40Z INFO  backseat_signed::plumbing] Loading sources 
index from "Sources.xz"
[2024-04-04T19:09:41Z INFO  backseat_signed::plumbing] Loading file from 
"vim-9.1.0199.tar"
[2024-04-04T19:09:41Z INFO  backseat_signed::plumbing] Loading Debian 
.orig.tar from "vim_9.1.0199.orig.tar.xz"
[2024-04-04T19:09:42Z INFO  backseat_signed::plumbing] Searching in index...
[2024-04-04T19:09:42Z INFO  backseat_signed::plumbing] File verified 
successfully
```

Tada.

Of course there's also a subcommand to check a given Sources.xz belongs 
to a given Release/Release.gpg combination. There's no support for 
InRelease yet.

The tool wasn't able to take .tar directly before. I just built this.
Just for you. 🖤

I've checked both, upstreams github release page and their website[1], 
but couldn't find any mention of .tar.xz, so I think my claim of Debian 
doing the compression is fair.

[1]: https://www.vim.org/download.php

cheers,
kpcyrd


More information about the rb-general mailing list