Version 2 of SWHID, ISO/IEC 18670:2025?
Stefano Zacchiroli
zack at upsilon.cc
Mon Jan 19 15:00:24 UTC 2026
Hello all, co-author of the SWHID spec here, thanks to kpcyrd for
raising this and to Pol for all the valid points he raised. In addition
(and to answer the original question as well):
On Mon, Jan 19, 2026 at 02:42:00PM +0000, Pol Dellaiera wrote:
> Indeed, SWHIDs rely internally on the SHA-1 algorithm. However, the hash is
> not computed over raw file contents alone. Instead, it is computed over a
> structured byte sequence that includes the object’s type and length,
> followed by its content. This domain separation significantly reduces the
> applicability of known SHA-1 collision attacks.
The used SHA-1 algorithm is also not "vanilla" SHA1, but the so called
"sha1collisiondetection", which will refuse to give you a checksum in
the collision cases. (Yes: this does not fully address the problem,
because it means in the nasty cases you simply cannot compute a
SWHIDv1.)
See also https://www.swhid.org/faq/#why-does-swhid-use-sha-1 and
subsequent points.
But kpcyrd is fully right: while we wanted to standardize SWHIDv1
because it was already (de facto) used out there, SWHIDv2 with stronger
hashes is needed and we are already working on it. Tentatively we want
to simply switch to SHA-2, with SHA-256 hashes, which would be a
relatively easy standard upgrade. But at the same time it will also make
textual hashes much longer, so we would *also* like to offer some more
compact representations of hashes than hex (possibly as an optional
alternative to hex encoding).
We don't have a timeline yet, but r-b people who are interested in this
are more than welcome to join the SWHID working group.
Cheers
--
Stefano Zacchiroli - https://upsilon.cc/zack
Full professor of Computer Science, Polytechnic Institute of Paris
Co-founder & CSO Software Heritage
More information about the rb-general
mailing list