# [rb-general] transitive collision resistance [was: rb formalism]

Orians, Jeremiah (DTMB) OriansJ at michigan.gov
Fri Dec 21 13:27:33 CET 2018

```> While I agree that you can certainly find collisions when you do
> crc16(H(a),H(b))
> or
> H(crc16(a),crc16(b))

> I fail to see how that would be possible with cryptographic hash functions like SHA-256, so
> H(H(a),H(b))

> especially since the hash functions internally usually work in rounds and you have to complete a sufficient number of rounds to get resistance and adding another full hash around it is > like doubling the number of rounds and the concatenation in the middle does not weaken that.

> One can even construct a general proof:
> Given a H where it is not possible to collide H(a) = H(b) with a ≠ b Then it is also not possible to collide
> H(H(a),H(a2)) = H(H(b),H(b2)) with a ≠ b  or  a2 ≠ b2

> because of the premiss, we have
> H(a) ≠ H(b)  or  H(a2) ≠ H(b2)
> so that H(a),H(a2) ≠ H(b),H(b2) - assuming hash output of fixed length and then the premiss applies again to the outer H to prove the conclusion.

Let us stop right here, having the same hash *DOES NOT* mean that the files are the same, rather that the probability that they are different is equal to the probability of finding a collision given certain resource constraints.

Hashes are the cheap version of approximating that files are the same. Much like floating point approximates numbers (cheaply and wrong but within a margin of good enough)

If I wanted a secure version of hashes for verification; it would require no less than 2 different types of hashes
And then I would formally prove that the only way for hashes to all be true is if all bytes in the file are exactly identical.

One can even do trivial hashes like countbytes(f), sumalleven(f), sumallodd(f), sum(f) to give us hard arbitrary constraints for our formal proof

```