SHA-1 how and when is it really a problem

Question

First, here’s my interpretation of the certificate validation process (if mistaken, please feel more than free to correct me).

When connecting to a web site using https, the website present an SSL certificate to the browser. It is used to authenticate the website. The browser has to have a way to decide whether to trust that certificate. The browser does this by checking whether the website’s certificate was issued and signed by a trusted Certificate Authority. When a certificate is issued, the Certificate Authority includes this proof by cryptographically "signing" the certificate using a private key, in a way only the real CA could do and that browsers can verify. But the CA doesn't actually sign the raw certificate. If runs the certificate through a one-way-hash algorithm like SHA-1 and signs it with the CA’s private key. When the browser is presented with the certificate one of the first things it does is to check the signature. Because the certificate was signed with the CA’s private key (and we assume that it is only in the CA’s possession) we can verify the signature (thus authenticate) the server because the CA’s corresponding public key is wrapped in a X.509 certificate that is pre-loaded in the browser. Next the browser calculates the SHA-1 for that certificate and then compares with the value presented in the certificate sent by the website to check it’s integrity, that it was not altered in transit.

If the above is true, then I have the following question. Although it was recently proved by researchers at Google that two different .pdf files produced the same SHA-1 value, thus a collision occurred, I’m wondering how this can be exploited. I mean, say example.com got a certificate with a SHA-1 value equals X and it’s signed by Verisign. An attacker creates an certificate for example.com and the SHA-1 value also equals X, so again, we have a collision, but this time it is not by any trustworthy CA! So if the attacker now tries to impersonate example.com, how this can be exploited if the browser will not be able to verify that it was signed by a trusted Root CA? Wouldn’t this cause any issues only IF the Root CA gets compromised ?

Added: Or why it is even a problem with document signing ? I mean, say a business partner is about to send me a contract. It's signed by him and I can verify and I sent it back after signing with my private key. So, why should I be concerned that he was able to compute a similar looking document with the same hash value ? If I didn't get it, how he can prove that I signed it? I guess, he can't. And even if I get the "fake" document, just because it is signed, shouldn't I still verify what's in the document before signing and sending it back ?

EDIT:

I think I start to understand. So, if for certificate A, the message digest equals to X and a Root CA signs it with its private key everyone can verify its authenticity by decrypting the signature with the public key that is pre-loaded in to the browser in form of a X.509 certificate, correct ? So given the hash value of X (as it is not a secret anyway) and even not knowing the private key that signed the hash I know that the final result (signature) = Y. So if I know that certificate A’s hash = X and I’m able to compute another certificate (with the Issuer filed set to the Root CA that signed the real certificate) and it’s hash value will as well = X (collision occurred), even not knowing the signers private key, if I supply the Y value as for the signature (because the unique hash of the certificate plus the encryption through the private key would produce a signature value of Y) , the browser will accept and validate the certificate, because it will be able to decrypt the signature with the corresponding public key of the Root CA, that is performing the validation, compute that hash again and match it against what was send by fake server and the fake certificate, correct? Thank you

score 2 · Accepted Answer · answered Mar 18 '17 at 10:16

2

An attacker creates an certificate for example.com and the SHA-1 value also equals X, so again, we have a collision, but this time it is not by any trustworthy CA!

The CA for a certificate is specified by the issuer field inside the certificate. The certificate is verified by looking for a trusted certificate which has this issuer as a subject and then use this certificate to validate the signature.

This means that if an attacker has a valid certificate which was signed with SHA-1 by a trusted CA and wants to use this signature for its own fake certificate the fake certificate needs to have both the same issuer information as the original certificate and result in the same SHA-1. If both is given this fake certificate is considered trusted same as the original one. Since the attacker has full control over the contents of the fake certificate he has also control of the issuer field and thus could set it as needed.

answered Mar 18 '17 at 10:16

Steffen Ullrich

190,458
29
381
434

So signing a certificate by a CA equals to specifying it's name in the issuer filed ONLY? – cyzczy Mar 18 '17 at 10:22
1

@adam86: Creating a certificate by a CA based on a certificate request means both: putting the CA name in the issuer field and then create the [digital signature](https://en.wikipedia.org/wiki/Digital_signature) of everything and including it in the certificate. The first is needed to know which CA signed it (i.e. which certificate to use for validation) and the last is needed prove that the given issuer has signed it. – Steffen Ullrich Mar 18 '17 at 11:23
But isn't the message digest or hash value then encrypted with the issuers (issuing Root CA or intermediate) private key for authenticity? – cyzczy Mar 18 '17 at 11:28
1

@adam86: kind of although digital signature is not necessarily encryption (depends on algorithm). But the issuer information is used so that the one can find the CA which should be used for validation instead of trying every CA if it could be used to validate the signature. – Steffen Ullrich Mar 18 '17 at 12:00
I'm still confused. So how can the browser authenticate the web site when presented with a fake certificate? Normally you should get a warning. So on the other hand if I use a fake cert with Issuer set to Verisign and the hash value equal any other cert issued by Verisign what part of the fake certificate will allow the browser to authenticate the web site? In the end the hash is equal but derivied from two different data inputs. Verisign didn't touch the fake cert at all. – cyzczy Mar 18 '17 at 12:21
1

@adam86: if the faked certificate has both the issuer set to Verisign and a valid signature by Verisign (stolen from another certificate with the same hash) then it is considered trusted. Of course all the other checks must still be valid, i.e. subject must match target site, certificate should not be expired... The browser does not know what the original input in the hash was, it just sees that the signature is valid. – Steffen Ullrich Mar 18 '17 at 13:27
But would he be able to steal a valid signature from another valid certificate? Do you have any paper un mind that would describe the signing process in detail? Everywhere I look, it says that the hash is signed with the issuers private key and that equals to encrypting is. – cyzczy Mar 18 '17 at 15:22
1

@adam86: RSA signing shares some steps with RSA encryption but there are signing algorithms like DSA where encryption does not even exist. And the signature is just part of the certificate, i.e. "stealing" is just copy the bytes. As for more details on the signing process try [How do the processes for digital certificates, signatures and ssl work?](https://security.stackexchange.com/questions/7421/how-do-the-processes-for-digital-certificates-signatures-and-ssl-work). – Steffen Ullrich Mar 18 '17 at 17:06
Could you please look at the EDIT part of my question and please tell me if my understanding is now correct ? Thank you in advance. I appreciate your help. – cyzczy Mar 18 '17 at 20:31
1

@adam86: mostly correct. Except that signing and signature verification does not necessarily mean encryption and decryption (for example in the case of a DSA signature). But how the creation and the validation of the signature exactly work does not matter for this question: the main point is that if the hash is the same then the resulting signature is the same and will be verified as valid. – Steffen Ullrich Mar 19 '17 at 04:55
I see. Thank you very much. One last question, the "stealing" copying the signature part. How could I see the signature? In Wireshark I guess I would see it? And the copying part, how would I add the signature bytes to my fake cert? Can this be done with OpenSSL or anu form of packet/certificate crafter? Or programmatically? – cyzczy Mar 19 '17 at 06:13
1

@adam86: the signature is part of the certificate which you can see in Wireshark but also with `openssl s_client ... | openssl x509 -text ..` or similar. As for copy the signature: just replace the bytes making up the signature in your fake certificate with the ones from the original certificate. – Steffen Ullrich Mar 19 '17 at 06:57
Yes, in fact I can see the signature, thank you. But I still have problems to visualize how this byte replacement would work. Can I do this with OpenSSL ? – cyzczy Mar 19 '17 at 07:39
1

@adam86: I don't really understand how you could have the knowledge to produce a hash collision but you are not able to extract some byte range from one file and put it into another file. Such problem is not related to information security and can be done with a few lines of code in most programming languages. – Steffen Ullrich Mar 19 '17 at 08:09
Ok, this makes seans. Thank you very much again, you helped me a lot. – cyzczy Mar 19 '17 at 08:11

score 0 · Answer 2 · answered May 08 '17 at 17:37

There is a lot of alarmism about SHA1 going about.

Understand there are different types of attack on a hash function. The current collision attack on sha1 lets the attacker do the following.

Choose a common prefix for the two colliding files before generating the "collision blocks".
Choose a common suffix for the two colliding files after generating the "collision blocks".

This is typical for initial collision attacks found on merkle-damgaurd hash functions.

Typically exploits for this type of collision attack take the following form.

The attacker choses a file format in which garbage can be easilly hidden and in which reasonablly complex logic can be implemented (pdf is the poster child).
The attacker constructs a common prefix containing the pdf header and some logic to skip over the collision blocks.
The attacker generates the collision blocks.
The attacker generates a distinct common sufix which looks at the collision blocks and changes the apparent content of the document based on which of the collision blocks is present.
The attacker convinces the victim to sign the "good" document.
The attacker transplants the signature to the "evil" document.

The attacker now has a copy of the "evil" document that was apparently signed by the victim. He can use that document to convince people that victim agreed to something he did not actually agree to.

Now how does this relate to CAs?

The attacker aims to transplate a signature from a legitimate certificate signed by a CA to an Evil certificate. The legitimate certificate will be for a domain the attacker legitmately owns, the Evil certificate may be for a domain the attacker wants to impersonate or it may be an "intermediate certificate" allowing the attacker to sign certificates at will for domains he wants to impersonate.

This is much harder than the attack described above for two reasons.

CAs only give the customer limited control over the content of the certificates.
Afaict certificate formats don't have room for the level of tricky logic possible in something like a pdf.

Due to this I do not belive it would be possible to exploit a "simple" collision attack against a CA.

More dangerous is a "distinct chosen prefix" collision attack. No distinct chosen prefix collision attack is currently known for SHA1. For MD5 one was found about 5 years after the first basic collision was found.

In this case the attacker can.

Choose two distinct chosen prefixes before generating the collision blocks.
Choose a single common chosen suffix after generating the collision blocks.

A distinct chosen prefix collision attack is much more dangerous because the content of the "evil" file does not have to be hidden inside the "good" file. Only a small block of apparently random garbage has to be hidden. It also means that as long as the attacker can predict most of the content of the file they only need to control a small part of it.

However there is still a problem, certificates have a "serial number". Since the serial number is one of the first fields in the cert it will inevitablly come before the collision blocks and therefore the attacker must be able to predict it in advance.

The CA/browser forum requires CAs to put at least 64 bits of randomness in their serial numbers. If CAs follow the rules it should be virtually impossible for the attacker to predict the serial number.

OTOH if a CA is sloppy and does not use random serial numbers then exploiting distinct chosen prefix collision attacks becomes more feasible. Such an attack was succesfully demonstrated with MD5 by a group of academics. Once the attack was demonstrated the CA in question started randomising their serial numbers.

If a chosen prefix collision attack on SHA1 is found and the attacker can find a sufficiently sloppy CA they may be able create a forged SHA1 certificate.

That is apparently sufficient for the browser vendors to decide to eliminate SHA1.

The ultimate attack on a hash function would be a preimage attack. With such an attack the attacker could transplant a signature from any existing certificate to their new certificate. However preimage attacks are much harder than collision attacks. Afaict even with MD5 preimage attacks are still computationally infeasible.

Luis Casillas · Answer 3 · 2017-05-08T19:36:31.930

A practical collision attack against the web PKI was already demonstrated in 2008, exploiting CAs that still used MD5. The general structure of the attack was:

Craft two certificate signing requests designed to collide: an honest one for a legitimate website, and a dishonest one that impersonates an intermediate CA and is thus authorized to issue certificates.
Submit the honest CSR to a CA. Since it's a legitimate CSR, the CA will issue an honest certificate for it.
Combine the signature on the honest certificate and the dishonest CSR to craft a forged intermediate CA certificate.
Use the forged certificate to issue any certificates you want, for any site you want.

The page has the nitty-gritty details, but this does illustrate the general shape of collision attacks against digital signatures: the problem is that if Eve can get Alice to sign a legitimate document that was crafted to collide with a forgery, then Eve can use the legitimate signature to convince Bob that Alice signed the forged document. In the web PKI attack, Alice is a CA and Bob is a web browser.

The collision attacks against MD5 that enabled this attack were more powerful than today's attack on SHA1, so we are not able to carry out such an attack today against SHA1 certificates. But a key principle here is that we should be anticipating attacks before they happen, not just reacting to them after they're upon us. The fact that a collision has been found for SHA1 means that it's likely more powerful attacks will come in the near future, so we should abandon SHA1.

To use history as an example, the first MD5 collision was found in 2004—four years before the practical web PKI attack I linked above. We don't know when similarly improved attacks against SHA1 will be found (or have been already found!), nor how big of a window we will have between a secret discovery and public knowledge.

In addition to relying on a distinct chosen prefix collision attack the CA attack in question also relied on finding a CA that did not randomise serial numbers. — Peter Green, May 09 '17 at 13:16

SHA-1 how and when is it really a problem

3 Answers3