Tracking source code: whatsrc.org

kpcyrd kpcyrd at archlinux.org
Tue Apr 30 15:38:47 UTC 2024


On 4/30/24 3:12 PM, Ludovic Courtès wrote:
> If you’re interested, you can incorporate data from
> <https://guix.gnu.org/sources.json>, a file that’s updated several times
> a day and which contains cryptographic hashes of tarballs and VCS
> checkouts as used in Guix (there’s a similar one for Nixpkgs but I can’t
> find the URL).

I'm indeed very interested, I looked at it briefly and have two questions:

1) In the `sources` list, how do I figure out which package a source 
object belongs to?

This is important so I can display it like:

 > Guix: htop 3.2.2 (https://github.com/htop-dev/htop#tag=3.2.2)

instead of:

 > Guix: (https://github.com/htop-dev/htop#tag=3.2.2)

2) For `type: "git"`, how is `integrity` calculated?

It's referencing `https://github.com/htop-dev/htop#tag=3.2.2` with 
`sha256-OrlNE1A71q4XAauYNfumV1Ev1wBpFIBxPiw7aF++yjM=` and I couldn't 
figure out how this is calculated.

Somebody on irc pointed out this could be nar (the archive format also 
used by NixOS), if that's the case I'd like to record it's hash too when 
processing a git repository and registering it as "alternative 
representation". Since Arch Linux is also importing htop source code 
from git (but from the 3.3.0 version tag), if they were on the same 
version they could be documented as "using the same source code".

There's similar logic already built into the code and database to 
register the sha256 of a .tar.gz as an alternative representation of the 
sha256 of it's .tar. As seen on this page (Debian is using a .tar.gz, 
Alpine, Fedora and openSUSE are using a .tar.xz, but the inner .tar has 
the same sha256 99032a43..., so the compression difference doesn't matter):

https://whatsrc.org/artifact/sha256:99032a437fbcb38a37f35f23712bbeb6489be112450e02443e16786d3d745b31

A pointer like "instead of `git archive`, run: ..." would already be 
helpful, if somebody wants to write a patch, the relevant code location 
would be:

https://github.com/kpcyrd/what-the-src/blob/8d9b18f54770a2c2830986af89af15b39c49c70c/src/git.rs#L110-L126

>> The site operates fairly co2 efficient (due to my Rust proficiency), I
>> showed a friend what kind of specs I run this on and they were
>> ✨stunned✨.
> 
> It’s behind CloudFlare though.  :-)

Fair, I'm using a caching CDN in case an artifact that I indexed gets 
hackernews'd. :)

The web code is fairly boring, but running your own instance including 
the worker to index all mentioned distros can be done on a <5€/mo VPS:

- 1 core
- 2GB ram
- 6GB swap
- 10GB temporary disk to deal with git
- disk for the database (currently 12GB big, but you should plan for 
more obviously)

With these specs you can run multiple workers (e.g. while one worker is 
resolving git deltas the other worker can still make use of the network 
to stream-process a .tar).

The container image is available at:

ghcr.io/kpcyrd/what-the-src:edge

And about 40MB in size.

cheers,
kpcyrd

PS: Since the original email I also imported openSUSE (bmwiedemann sent 
a patch, thanks!) and Kali Linux (a friend of mine work there and asked)


More information about the rb-general mailing list