Tracking source code: whatsrc.org

kpcyrd kpcyrd at archlinux.org
Tue Apr 23 18:35:57 UTC 2024


hello list,

I built a website and imported source code inputs from:

- Arch Linux
- Debian sid and stable-security
- Fedora rawhide
- Alpine edge

into a common database. I keep track of the tarball content, and the 
checksums both before and after compression.

This allows lookups like:

https://whatsrc.org/artifact/sha256:981a75f8291020d9f6632c6160ee3651f376bdf354373bea00506a220e355134

```
Build input of:

- Alpine: cmatrix 2.0-r2 
(https://github.com/abishekvashok/cmatrix/archive/v2.0.tar.gz) 
sha512:1aeecd8e8abb6f87fc54f88a8c25478f69d42d450af782e73c0fca7f051669a415c0505ca61c904f960b46bbddf98cfb3dd1f9b18917b0b39e95d8c899889530
- Arch Linux: cmatrix 2.0-3 
(https://github.com/abishekvashok/cmatrix/archive/v2.0.tar.gz) 
sha256:ad93ba39acd383696ab6a9ebbed1259ecf2d3cf9f49d6b97038c66f80749e99a
- Debian: cmatrix 2.0-6 (cmatrix_2.0.orig.tar.gz) 
sha256:ad93ba39acd383696ab6a9ebbed1259ecf2d3cf9f49d6b97038c66f80749e99a
- Fedora: cmatrix 2.0-9.fc40 (cmatrix-2.0.tar.gz) 
sha256:ad93ba39acd383696ab6a9ebbed1259ecf2d3cf9f49d6b97038c66f80749e99a
```

In this case, there's consensus between the 4 distributions about what's 
the source code of cmatrix 2.0. They may still use different build 
instructions or apply patches (and definitely have different build 
environments), but they seem to agree what the upstream source code is 
(even despite absence of code signing by upstream.. maybe this hill was 
never worth dying on in the first place).

There's a search feature you can use (prefix-based with a btree index):

https://whatsrc.org/search?q=htop

In this case we find two different tarballs for htop 3.3.0:

- 
https://whatsrc.org/artifact/sha256:5971ba79fcb5e5effe182362f1dc29edfe4cfccb8389a8160e161b061e7db473
- 
https://whatsrc.org/artifact/sha256:487fbce5bc6f92a3fa9283ea1eb5f70f85bf31fe0bbee92a692f9c3f0f96f7d4

You can diff the two using:

https://whatsrc.org/diff-sorted/sha256:5971ba79fcb5e5effe182362f1dc29edfe4cfccb8389a8160e161b061e7db473/sha256:487fbce5bc6f92a3fa9283ea1eb5f70f85bf31fe0bbee92a692f9c3f0f96f7d4

--- is Debian and Fedora, +++ is Alpine

 From the diff (but also from the infobox of the second link), we can 
tell this is before/after autotools pre-processing. The -sorted is 
necessary because the pre-processed dist-tarball also has ordering 
issues, making the diff very hard to read without it.

It's importing code from git repositories too, as seen on this page:

https://whatsrc.org/artifact/sha256:494fa0b23697967ab99faa8eb07f4e24e9f431ac7ab771cfd8f3dda068590b7b

It's using `git archive` (without compression) to generate a 
deterministic tar representation for a given git tree object. These are 
always free of ordering issues.

When looking at the PKGBUILD for this package:

https://gitlab.archlinux.org/archlinux/packaging/packages/ncmpcpp/-/blob/816dbe564554c1c4f772e84a49faf3708fa62a29/PKGBUILD

You can find this line:
```
b2sums=('babc1506eca6dc5bd48e58fabfd42502d33b506b2e600b7aa98126a6deb0d68e14dc692abb0ef5079e3ccf710648f0b82fe1b404303d932f2156104c479442ec'
```

Since both Arch Linux PKGBUILDs and whatsrc are content-addressed you 
can convert it into this link:

https://whatsrc.org/artifact/blake2b:babc1506eca6dc5bd48e58fabfd42502d33b506b2e600b7aa98126a6deb0d68e14dc692abb0ef5079e3ccf710648f0b82fe1b404303d932f2156104c479442ec

I'm interested in adding NixOS as a 5th distribution, but I'm not sure 
how to get the relevant data. Help welcome in 
https://github.com/kpcyrd/what-the-src/issues/12. The existing rpm 
tooling may also work for OpenSUSE but I haven't tried yet.

The site operates fairly co2 efficient (due to my Rust proficiency), I 
showed a friend what kind of specs I run this on and they were ✨stunned✨.

The purpose of this site is to give a better understanding of which line 
we need to defend in regards of source code (hint: it's the source code 
we ingest into reproducible builds, for the binaries we then put into 
our computers).

Hopefully this helps people with reasoning about said source code.

cheers,
kpcyrd


More information about the rb-general mailing list