How is knowing a specific collision of hashing algorithm useful?

Question

Not too long ago the first collision of the SHA-1 was found. If I get this right, that means that someone found two different inputs that give the same output. Two different messages give the same output. That this is even possible is trivial since the output always has a fixed length. From what I understand and hear, finding a collision is a major problem for a hashing algorithm and signals that the algorithm is more or less dead.

While I think I can see that there might be different types of collisions, my general questions are: How is finding a collision a problem? How can this be exploited?

It is not clear to me how just finding two random messages with the same hash will allow someone "easily" to for example sign messages or break hashed password files (ok so maybe storing hashed passwords isn't a good idea anyway). If I, for example (maybe oversimplified?), sign a message and you want to sign another message with the same hash, how would the knowledge of a specific collision help you?

EDIT: I see the question/answer here: What are the implications of a SHA-1 collision being found? but I don't think it answers my question. I understand that "It would be possible, in theory, for an attacker to generate two executable files which have the same SHA-1 hash, but perform different things when run." for example. But how likely is that? How does knowing a specific hash make tis possible? (I updated the title of the question).

hashes can be used in a variety of crypto applications, collisions only affect some of those. If you read about the amount of work/money that google spent, it removes a lot of concern; this is not a script-kiddie hash-cat level exploit, so someone better have a darn good reason to invest the effort, and event then, it takes years; certainly much less time than it takes to find/report/resolve a banking error... — dandavis, Apr 11 '17 at 19:27
After your edit it still isn't clear to me what you're seeking to learn that the other question didn't answer. It sounds like you may now be trying to figure out the technical intricacies of how generating these collisions is possible? — PwdRsch, Apr 11 '17 at 21:42

Mike Ounsworth · Answer 1 · 2017-04-11T17:48:02.400

The question that @PwdRsch linked to does cover your question, but I'll give a more targeted answer:

Imagine you and I are negotiating a contract. Let's say I come up with two documents that have the same hash value, say

I agree to pay Mike Ounsworth $1,000 for these services.

And

I agree to pay Mike Ounsworth $1,000,000 for these services.

Then I get you to sign the first one, but since the hashes are the same, you have also signed the second one.

I would create these two documents by playing with different wordings, punctuation, non-printing characters, etc, until I get a collision. For an unbroken hash function with a 256 bit output, I would have to try on average 2²⁵⁵ permutations before finding a collision. This would cost me way more in electricity than the crime is worth. Any shortcuts that allow my to do this in less time would be considered a vulnerability in the hash function, and it would be retired.

By a similar argument, storing hashed passwords is fine as long as the hash function is not broken.

What you're describing here is finding a preimage, not a collision. — Luis Casillas, Apr 11 '17 at 17:37
i should note that Google spent far more than $990,000 to find a "duplicate"... — dandavis, Apr 11 '17 at 19:23

score 1 · Answer 2 · answered Apr 11 '17 at 17:44

The way collision attacks are generally exploited is this:

Mallory (the attacker) generate two documents, which we'll label good and bad, whose hashes collide.
Mallory gives good to Alice for her to sign.
Alice signs good, yielding the signature that we'll label good_signature, and gives that to Mallory.
Mallory now presents the bad document and the good_signature to Bob.
Bob verifies that Alice's good_signature is valid for the bad document.
Now Bob mistakenly believes that Alice signed the bad document, and acts on it.

One concrete example of this is PKI (e.g., SSL certificates). If I can create two certificate signing requests that collide, and submit the "good" one to a CA to get their signature on it, I might be able to extract that signature to forge a "bad" certificate. This is exactly how the demonstrated MD5 certificate forgery attack from a few years ago worked; the researchers were able to transfer a CA's signature on a legitimate certificate they requested to a malicious certificate that allowed them to issue their own as well.

Just for attribution, an almost identical version of this appears at https://en.wikipedia.org/wiki/Collision_attack#Digital_signatures — Mike Ounsworth, Apr 11 '17 at 17:54
Well, it's general knowledge. The MD5 certificate forgery link I provided illustrates the pattern as well. — Luis Casillas, Apr 11 '17 at 18:00

score 0 · Answer 3 · answered Apr 11 '17 at 15:56

At a high level, hashing is used to determine uniqueness - the output will be unique to a specific input. Collisions are scenarios where that assumption is violated - more than one input will lead the same output.

A potential attack with our example would be Alice wants to send Bob a message, and Bob needs to make sure the message he received is truly from Alice. If the hash of the message that Alice sent matches the hash of the message that Bob received, Bob can confirm that he has received the message that Alice sent. Now, if Mallory can manipulate Alice's message, and do it in such a way that it has the same hash as Alice's original message (A collision), Bob will trust Mallory's message, which he believes is Alice's original message.

score -1 · Answer 4 · answered Apr 11 '17 at 17:51

-1

In this post Linus Torlvalds talks about this. https://plus.google.com/+LinusTorvalds/posts/7tp2gYWQugL

answered Apr 11 '17 at 17:51

gonisimchuk

19
1
2

please do not post link-only answers - include the relevant parts of the link in your answer – schroeder Apr 11 '17 at 17:51

How is knowing a specific collision of hashing algorithm useful?

4 Answers4