3

Is there any security issues if you wrap your cryptographic hash with MD5 for storage purposes?

For example:

$hash = md5(hash($password,$salt,$rounds));

Note: hash() uses scrypt or bcrypt or pbkdf2 internally.

The only purpose of md5 is for storage since it only uses 32 bytes vs storing very long raw hash.

EDIT: Judging from the comments below I agree MD5 is not a good idea as it is collision prone but what if I use a better hashing function like SHA512? Still, comments below argue it may actually weaken it but can somebody please explain how?

How can this: $hash = SHA512(bcrypt($password,$salt,$rounds));

be weaker than this: $hash = bcrypt($password,$salt,$rounds); ?

It appears to me the former is stronger since you have to "crack" SHA512 first before you can even begin working on cracking bcrypt. Why others says otherwise?

IMB
  • 2,908
  • 6
  • 28
  • 42

6 Answers6

4

You don't lose anything by applying MD5.

Since you use md5(hash()) as your combined hashing scheme, presumably the attacker doesn't have the ability to submit an input that is only evaluated by md5(); it's either evaluated by the whole scheme or nothing at all. So given a hash h in your database, the attacker's goal is to find m such that md5(hash(m)) = h. (m may or may not be the original password, since a collision password is good enough to allow them to log in fraudulently.)

Assuming the attacker obtained h and they were able to take advantage of a weakness in MD5 and find a pre-image h' such that MD5(h') = h, that doesn't help them since they then still have to find the pre-image of h' in hash. If hash is sufficiently strong in it's own right, this is no weaker than using hash by itself.

The gist of the idea here is that MD5 acts like a random map of inputs to outputs. MD5's mapping is well distributed and random strings don't generate collisions. Security-wise, MD5 makes it too easy to intelligently craft collisions, but we don't care about the the attacker intelligently crafting the input to MD5 in this case, we only care that by adding a mapping to the output of hash we don't create more collisions with MD5(hash()) than expected of hash by itself. In other words, you aren't relying on MD5's cryptographic collision-resistent properties here. You could add any sort of random-ish mapping to the output of a good hash function and, as long as that mapping isn't prone to accidental collisions, you should lose nothing.

Addressing the updated part of the OP: You can use SHA-512 and truncate the output. The recent FIPS-140-4 standard specifies truncating SHA-512 to a handful of lengths. This is a fully-endorsed method.

B-Con
  • 1,842
  • 12
  • 19
  • Thank you for the thorough explanation. Just to confirm in layman's term, do you agree that there is no problem in `$hash = md5(hash($password,$salt,$rounds));` for as long as `hash()` internally uses a strong algo like `bcrypt` ? – IMB Aug 09 '12 at 16:22
  • 1
    If the attacker can't submit input directly to `md5()` (such as if the `hash()` part were computed on the host and then sent to the server) and `md5(hash())` is always treated as an autonomous unit, then it should not be weaker than just `hash()` by itself. – B-Con Aug 09 '12 at 16:37
  • Yes all input will have to go through `hash()` first before `md5()`. On another note, I guess it goes without saying that in case the attacker obtained the database of hashes, the `md5()` wrapping actually makes his job one step harder right vs just `hash()` alone ? – IMB Aug 09 '12 at 16:44
  • Presumably, yes it is harder. But when we reason about the security of a scheme, we usually like to "round down to zero" any of the steps that we think have questionable security. (We tend to be paranoid that way.) MD5 should add some more security, but you shouldn't be relying on it for that. – B-Con Aug 09 '12 at 16:48
  • Agree, the only reason I chose md5 is it returns 32 bytes vs sha1 which is 40 or sha256 which is 64. But I think I will use sha256 instead then truncate. – IMB Aug 09 '12 at 16:58
2

It is better to just truncate the original hash output.

First, mind the salt storage: in order to verify a password, you will need to recompute the password hash, using the same salt and iteration count. Some hash functions (namely bcrypt) traditionally encode the salt and iteration count in the output. If you just keep a hash of the bcrypt output but not the salt itself somewhere, you will not be able to recompute the hash from a given password, breaking functionality.

Then, password hashing works on preimage resistance. If the hash function you use is any good, then its resistance to preimages will be on the order of 2n with an output of n bits. Thus, you can truncate the output (the actual output bits, not counting the encoding of salt and iteration count, and disregarding any transformation of the bits into characters with hexadecimal or Base64) down to, say, 80 bits. If the hash function is any good, this will be robust enough. No need to rehash with MD5 or any other function; a simple truncate will do the trick. And if the hash function is not good and introduces weaknesses when thus truncated, then... why would you use it in the first place ?

(Collisions have nothing to do with the problem. Collisions are irrelevant to password hashing. Applying MD5 on the output would not induce any significant weakness, despite the non-resistance of MD5 to collisions, since password hashing is about preimages. But truncation is simpler, and "simpler" means "good" in security; it is also more flexible since you can choose the length as you see fit.)

Thomas Pornin
  • 322,884
  • 58
  • 787
  • 955
1

Using MD5 in this case is fine, but you don't need to use MD5 to reduce the length of the hash string. Just perform a simple operation on the hash string to reduce it's length. For example, if your full length hash string is:

e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

You can trim it to whatever length is acceptable to you for storage, like:

e3b0c44298fc1c149afbf4c8996fb9242

Matrix
  • 4,028
  • 14
  • 25
  • 2
    PBKDF2 and similar KDFs don't require you to do this at all. You can just explicitly set the derived key length. – Polynomial Aug 09 '12 at 13:15
  • 1
    Isn't this method quite collision prone? – IMB Aug 09 '12 at 13:29
  • 1
    @IMB: If the truncation length is sufficiently long, it should not be a problem. For example, the SHA-512 hash function actually has truncated versions endorsed in the NIST spec. Truncating hashes is generally not a problem, unless the hash in question is known to have a specific weakness. – B-Con Aug 09 '12 at 15:03
1

I think that MD5(hash(password)) has no known practical weakness, if you need to truncate the hash to 16 bytes. Read B-Con's answer for the explanation. Note that this does rely on some properties of the MD5 function, not just on properties of the hash function. For example, suppose that it was discovered that the MD5 is actually not surjective (as far as I know, this is not known), and there are only N different MD5 digest values. Then finding another hash with the same MD5 would be a matter of computing the hash of N different passwords, which might be workable if N is not too large.

However, if you do that, you will be guilty of rolling your own crypto. Do not roll your own crypto.

Fortunately, there's a better way to obtain a 16-byte value from a hash function that's still a hash function. With a good hash function, each bit is as independent as possible from the other bits and doesn't reveal any information about the hash. Therefore, if you take 128 bits from a hash, you get a 128-bit hash. For the SHA family, this is officially accepted (FIPS 180-4 §7): if you want an N-bit hash, you can take an approved hash function with that produces more than N bits and truncate to the first N bits. The NIST recommendations for using hashes (§5.1) discuss the strength you can expect from an N-bit truncation.

If you're doing this because you have a fixed-size database field that was intended for MD5 digests, keep in mind that while it is acceptable to truncate the digest, the salt is non-negotiable. You must have a unique salt, and it has to be stored in the database somehow. Each user's salt should be distinct both from any other user's salt and from a salt used in any other database. (Salt collisions can be tolerated if they're infrequent enough, but if two users happen to have the same password and the same salt, that will be obviously visible.) The salt can be generated from other fields, but it isn't a good idea, because it's hard to achieve unicity; see Primary Key as Salt?. Other background on salt: What should be used as a salt? How to store salt? Why is using salt more secure?

Gilles 'SO- stop being evil'
  • 51,415
  • 13
  • 121
  • 180
0

Whilst there aren't any explicit attacks I'm aware of against such a thing, I don't recommend it. MD5 is broken, and has many known collisions. As such, an attacker might be able to generate inputs such that the output of the KDF is a collision with another password.

Instead, why not use a KDF such as PBKDF2 with a tweakable output size?

Polynomial
  • 133,763
  • 43
  • 302
  • 380
  • So I guess the only problem of this is collision. What if I use SHA256 instead for a wrapper? For example, which is stronger in this case: A PBKDF2 hash with 64 bytes output... or a PBKDF2 hash with 256 bytes output then wrapped with SHA256 for a 64 bytes final output? – IMB Aug 09 '12 at 13:26
  • The first option is stronger, since you're not relying on an intermediate hash. Also, SHA256 would be 32 bytes final output, not 64 bytes. – Polynomial Aug 09 '12 at 13:53
  • @IMB: The shorter the output, the higher the risk of collision. Is storage in such short demand that you're willing to sacrifice security for it? – Piskvor left the building Aug 09 '12 at 13:54
  • @Polynomial I guess it's safe to say wrapping `scrypt` or `bcrypt` or `pbkdf2` to a fixed length hash weakens them? – IMB Aug 09 '12 at 13:56
  • @Piskvor It's not really about shortage of space, it's just that if it actually helps or doesn't affect security then why not, besides, they look clean in short hashed form. – IMB Aug 09 '12 at 13:58
  • @IMB: It definitely doesn't help, and it most likely weakens the security. Also, is "looking clean in storage" one of your priorities? – Piskvor left the building Aug 09 '12 at 14:01
  • @Piskvor I already told you that. – IMB Aug 09 '12 at 14:07
0

Yes, there is an issue.

The resulting MD5 hash is just that. A MD5 hash. It would be prone to collision attacks against it.

MD5 is a hash function with known collision vulnerabilities. Avoid using it if at all possible.

  • 2
    If they crack the md5 hash what they get is another hash made with `bcrypt` for example. – IMB Aug 09 '12 at 13:07
  • Your second point doesn't have much merit. Making rainbow tables for the entire output space of a KDF is completely infeasible. – Polynomial Aug 09 '12 at 13:08
  • 2
    @IMB You don't actually crack the hash. You find x where MD5(x) gives the same hash as your input. –  Aug 09 '12 at 13:11
  • @TerryChia Yes that's what I meant. – IMB Aug 09 '12 at 13:13