Tracking source code: whatsrc.org
kpcyrd
kpcyrd at archlinux.org
Tue Apr 30 15:38:47 UTC 2024
On 4/30/24 3:12 PM, Ludovic Courtès wrote:
> If you’re interested, you can incorporate data from
> <https://guix.gnu.org/sources.json>, a file that’s updated several times
> a day and which contains cryptographic hashes of tarballs and VCS
> checkouts as used in Guix (there’s a similar one for Nixpkgs but I can’t
> find the URL).
I'm indeed very interested, I looked at it briefly and have two questions:
1) In the `sources` list, how do I figure out which package a source
object belongs to?
This is important so I can display it like:
> Guix: htop 3.2.2 (https://github.com/htop-dev/htop#tag=3.2.2)
instead of:
> Guix: (https://github.com/htop-dev/htop#tag=3.2.2)
2) For `type: "git"`, how is `integrity` calculated?
It's referencing `https://github.com/htop-dev/htop#tag=3.2.2` with
`sha256-OrlNE1A71q4XAauYNfumV1Ev1wBpFIBxPiw7aF++yjM=` and I couldn't
figure out how this is calculated.
Somebody on irc pointed out this could be nar (the archive format also
used by NixOS), if that's the case I'd like to record it's hash too when
processing a git repository and registering it as "alternative
representation". Since Arch Linux is also importing htop source code
from git (but from the 3.3.0 version tag), if they were on the same
version they could be documented as "using the same source code".
There's similar logic already built into the code and database to
register the sha256 of a .tar.gz as an alternative representation of the
sha256 of it's .tar. As seen on this page (Debian is using a .tar.gz,
Alpine, Fedora and openSUSE are using a .tar.xz, but the inner .tar has
the same sha256 99032a43..., so the compression difference doesn't matter):
https://whatsrc.org/artifact/sha256:99032a437fbcb38a37f35f23712bbeb6489be112450e02443e16786d3d745b31
A pointer like "instead of `git archive`, run: ..." would already be
helpful, if somebody wants to write a patch, the relevant code location
would be:
https://github.com/kpcyrd/what-the-src/blob/8d9b18f54770a2c2830986af89af15b39c49c70c/src/git.rs#L110-L126
>> The site operates fairly co2 efficient (due to my Rust proficiency), I
>> showed a friend what kind of specs I run this on and they were
>> ✨stunned✨.
>
> It’s behind CloudFlare though. :-)
Fair, I'm using a caching CDN in case an artifact that I indexed gets
hackernews'd. :)
The web code is fairly boring, but running your own instance including
the worker to index all mentioned distros can be done on a <5€/mo VPS:
- 1 core
- 2GB ram
- 6GB swap
- 10GB temporary disk to deal with git
- disk for the database (currently 12GB big, but you should plan for
more obviously)
With these specs you can run multiple workers (e.g. while one worker is
resolving git deltas the other worker can still make use of the network
to stream-process a .tar).
The container image is available at:
ghcr.io/kpcyrd/what-the-src:edge
And about 40MB in size.
cheers,
kpcyrd
PS: Since the original email I also imported openSUSE (bmwiedemann sent
a patch, thanks!) and Kali Linux (a friend of mine work there and asked)
More information about the rb-general
mailing list