Tag Archives: probability

The probability of SHA-1 collisions

Git uses SHA-1 hashes to identify its objects (commits, refs, blobs, etc). It assumes that if the SHA-1 hashes of two objects are equal, then the two objects themselves are equal, hence there’s no need to store both in the repo. In other words, if the hashes of two different objects turned out to be equal at some point, Git wouldn’t be able to handle that!

But what are the chance of such a SHA-1 collision? This was discussed in the Git mailing list today and a recent analysis on this problem was shed into light. This analysis has used the famous Birthday Paradox and come up with a formula to determine the probability. It quotes,

“Applying the formula for 160bit SHA-1 you need 1.7e23 objects to get a 1% chance of collision. The current Linus kernel repository has 2.7 million objects. So to get a collision you’d need a repository that’s 6e16 times larger. That should be plenty.

For some wacky perspective that’s 10 million kernel sized contributions for every man woman and child on earth together in a single repository. It would seem git will reach plenty of other bottlenecks before SHA-1 becomes a problem…”

The probability is, of course, still non-zero. But don’t ever expect to witness such a collision before you die.