Tracking source code: whatsrc.org
kpcyrd
kpcyrd at archlinux.org
Tue Apr 23 18:35:57 UTC 2024
hello list,
I built a website and imported source code inputs from:
- Arch Linux
- Debian sid and stable-security
- Fedora rawhide
- Alpine edge
into a common database. I keep track of the tarball content, and the
checksums both before and after compression.
This allows lookups like:
https://whatsrc.org/artifact/sha256:981a75f8291020d9f6632c6160ee3651f376bdf354373bea00506a220e355134
```
Build input of:
- Alpine: cmatrix 2.0-r2
(https://github.com/abishekvashok/cmatrix/archive/v2.0.tar.gz)
sha512:1aeecd8e8abb6f87fc54f88a8c25478f69d42d450af782e73c0fca7f051669a415c0505ca61c904f960b46bbddf98cfb3dd1f9b18917b0b39e95d8c899889530
- Arch Linux: cmatrix 2.0-3
(https://github.com/abishekvashok/cmatrix/archive/v2.0.tar.gz)
sha256:ad93ba39acd383696ab6a9ebbed1259ecf2d3cf9f49d6b97038c66f80749e99a
- Debian: cmatrix 2.0-6 (cmatrix_2.0.orig.tar.gz)
sha256:ad93ba39acd383696ab6a9ebbed1259ecf2d3cf9f49d6b97038c66f80749e99a
- Fedora: cmatrix 2.0-9.fc40 (cmatrix-2.0.tar.gz)
sha256:ad93ba39acd383696ab6a9ebbed1259ecf2d3cf9f49d6b97038c66f80749e99a
```
In this case, there's consensus between the 4 distributions about what's
the source code of cmatrix 2.0. They may still use different build
instructions or apply patches (and definitely have different build
environments), but they seem to agree what the upstream source code is
(even despite absence of code signing by upstream.. maybe this hill was
never worth dying on in the first place).
There's a search feature you can use (prefix-based with a btree index):
https://whatsrc.org/search?q=htop
In this case we find two different tarballs for htop 3.3.0:
-
https://whatsrc.org/artifact/sha256:5971ba79fcb5e5effe182362f1dc29edfe4cfccb8389a8160e161b061e7db473
-
https://whatsrc.org/artifact/sha256:487fbce5bc6f92a3fa9283ea1eb5f70f85bf31fe0bbee92a692f9c3f0f96f7d4
You can diff the two using:
https://whatsrc.org/diff-sorted/sha256:5971ba79fcb5e5effe182362f1dc29edfe4cfccb8389a8160e161b061e7db473/sha256:487fbce5bc6f92a3fa9283ea1eb5f70f85bf31fe0bbee92a692f9c3f0f96f7d4
--- is Debian and Fedora, +++ is Alpine
From the diff (but also from the infobox of the second link), we can
tell this is before/after autotools pre-processing. The -sorted is
necessary because the pre-processed dist-tarball also has ordering
issues, making the diff very hard to read without it.
It's importing code from git repositories too, as seen on this page:
https://whatsrc.org/artifact/sha256:494fa0b23697967ab99faa8eb07f4e24e9f431ac7ab771cfd8f3dda068590b7b
It's using `git archive` (without compression) to generate a
deterministic tar representation for a given git tree object. These are
always free of ordering issues.
When looking at the PKGBUILD for this package:
https://gitlab.archlinux.org/archlinux/packaging/packages/ncmpcpp/-/blob/816dbe564554c1c4f772e84a49faf3708fa62a29/PKGBUILD
You can find this line:
```
b2sums=('babc1506eca6dc5bd48e58fabfd42502d33b506b2e600b7aa98126a6deb0d68e14dc692abb0ef5079e3ccf710648f0b82fe1b404303d932f2156104c479442ec'
```
Since both Arch Linux PKGBUILDs and whatsrc are content-addressed you
can convert it into this link:
https://whatsrc.org/artifact/blake2b:babc1506eca6dc5bd48e58fabfd42502d33b506b2e600b7aa98126a6deb0d68e14dc692abb0ef5079e3ccf710648f0b82fe1b404303d932f2156104c479442ec
I'm interested in adding NixOS as a 5th distribution, but I'm not sure
how to get the relevant data. Help welcome in
https://github.com/kpcyrd/what-the-src/issues/12. The existing rpm
tooling may also work for OpenSUSE but I haven't tried yet.
The site operates fairly co2 efficient (due to my Rust proficiency), I
showed a friend what kind of specs I run this on and they were ✨stunned✨.
The purpose of this site is to give a better understanding of which line
we need to defend in regards of source code (hint: it's the source code
we ingest into reproducible builds, for the binaries we then put into
our computers).
Hopefully this helps people with reasoning about said source code.
cheers,
kpcyrd
More information about the rb-general
mailing list