27

I'm working with some middleware that requires username/password authentication. The middleware uses MD5 hash for the password. The MD5 hash, of course, is not fit for the purpose of storing passwords. We need to address this.

We tried modifying the middleware to use a newer hash but it is a crap system we can't really change easily. However, we can control the web site that sits on top of it, and it's easy to change its code. So one of our developers had this idea:

  1. When the user registers, the web site generates its own salt, then hashes the password with SHA-256 before passing it to the middleware. The middleware will then hash the password again using MD5 and its own salt.

  2. When the user signs on, the web site retrieves its own salt then attempts to recreate the SHA-256 hash from the password that the user typed in. The web site then passes the SHA-256 hash to the middleware for validation. The middleware retrieves its own salt and attempts to recreate the MD5 hash from the salt and the SHA-256 that was passed in. If they match, the signon attempt is successful.

By combining the hashes in this manner, will my site be as secure as if were using the SHA-256 hash alone? Or does double hashing create some kind of vulnerability?

John Wu
  • 9,181
  • 1
  • 29
  • 39
  • No, the security is defined by the weakest link. –  Feb 08 '22 at 00:32
  • 8
    I believe it would be an improvement: MD5 with 256 bits of entropy is better than MD5 with whatever entropy the user password have. – ThoriumBR Feb 08 '22 at 00:55
  • 1
    MD5 is still resistant to pre-image attacks, so an attacker would not be able to produce a string that results on the salt+hash stored on the database. – ThoriumBR Feb 08 '22 at 00:59
  • 3
    @MechMK1 that’s not always true. – Tim Feb 08 '22 at 09:33
  • 2
    Using (middleware that enforces) a weak password hash may be a security compliance failure. Depending on the situation, that may expose the company to liability. If you can bring this to the attention of management from a legal liability standpoint, perhaps that could convince them to assign resources to actually fix the problem? – marcelm Feb 08 '22 at 12:51
  • 4
    Are you considering the possibility of an attacker hacking into your system in a way that allows them to log into the middleware directly, bypassing the SHA256 step? If you've thought about it and concluded that it would be difficult enough to deter the kind of attacker you need to protect against, that's fine. I'm just wondering if that attack path is within the scope of the question. If it is, that changes your security considerations significantly, because it removes the attacker's need to find a SHA256 preimage. – David Z Feb 08 '22 at 17:43
  • 1
    @MechMK1, please explain to us why you think that. If the middleware used ROT-13 for hashing, an attacker could recover the SHA-256 hash immediately. And then they would be in the same situation as attacking a system where the middleware uses SHA-256, and the client does nothing. – gnasher729 Feb 09 '22 at 00:34
  • If you can eliminate the chance of someone bypassing the website and directly going to the middleware, consider eliminating the weak middleware authentication (for simplicity, set it to "password" or such), and just do proper authentication in the website. – Deduplicator Feb 09 '22 at 17:37
  • 2
    If by "web site" you mean client-side hashing (e.g. JavaScript running in the user's browser), then this doesn't help improve security; this is addressed in other questions e.g. https://security.stackexchange.com/a/53606/37019 – gengkev Feb 10 '22 at 02:59
  • Looks like it would resist better to raindow attacks (if the attacker only has access to the final hash), but would be weaker to birthday attacks. – Vulpo Feb 10 '22 at 14:05
  • Does it mean you share salt between web site and middleware? That could be a weak spot. – akostadinov Feb 10 '22 at 17:18

6 Answers6

33

Yes, combining hashes in this fashion will increase the overall security of password storage on the website, mostly by the use of salted SHA-256 hashing. The final MD5 digest has little effect but does nominally improve security. However, the described scheme is not very strong.

The main problem is that if an attacker gains access to the hashed passwords and learns this scheme of password protection they can easily start searching for common passwords and brute forcing the entire set. Because both SHA-256 and MD5 are so cheap to compute it may be feasible to recover several passwords.

SCrypt would be a much better choice than a SHA digest because of relative cost of computation. (BCrypt would be a good choice too but is infeasible due to the final MD5 preventing verification.) Another simple improvement above what is described would be to perform many (thousands) of SHA-256 (or larger) digests to make it cost prohibitive to search the space but not unduly expensive (say, a few hundred milliseconds) to verify a single password.

maerics
  • 446
  • 2
  • 5
  • 2
    +1. Neither SHA-256 nor MD5 (nor any other fast hash) are at all suitable for password hashing by themselves, *with or without salting*. This is because modern graphics cards can compute tens of billions of such hashes per second, trivially brute-forcing all but the best passwords in moments even if the salt requires you to brute-force each in independently. You can use fast hashes in a construction that iterates them many times with a feedback loop (e.g. PBKDF2), but it's better to use a purpose-built modern password hashing algorithm (argon2 or scrypt) designed to resist brute forcing. – CBHacking Feb 10 '22 at 23:36
12

Adding the website layer might be an improvement, depending on the relative size of the middleware's salt and what else you can do to increase the strength of the website's hashing algorithm - but it's still not great without additional improvement:

  • If the middleware salt is too small, even just a larger salt at the website SHA256+salt step could be an improvement.

  • If the sha256+salt hashing method has additional work factors (like 100,000 rounds of SHA256 instead of a single round, etc.), then the website SHA256+salt combination has the potential to be an improvement.

  • But even if the website salt and the middleware salt are both reasonably large, if both methods fail to include stretching or other work factors ... then both methods are quite bad - and adding the website layer provides no material improved resistance to attack (but does not make it worse).

Note also that your question implies that MD5 is somehow inherently worse than SHA256 for storing hashes, perhaps because "MD5 is broken". If so, that "breakage" is in the context of finding collisions, and is unrelated to cracking resistance. MD5 is a terrible choice for password storage, but it's not because MD5 is broken - it's because it's a fast hash that was never intended for password storage (and neither was SHA256).

Royce Williams
  • 9,318
  • 1
  • 32
  • 55
4

I think it would improve the overall security. The bruteforce throughput of the attacker is now mostly determined by the first stage (salt + SHA-256), not by the fast MD5.

The other factor increasing the security is that if the attacker gets the MD5 database, the inputs are all 256-bit random passwords from SHA-256. No wordlist will have all inputs, and isn't feasible to build a rainbow table for it.

I would not store the salt + SHA-256 anywhere, only the salt. With this, if the database leaks the attacker will have to bruteforce the final result without having the SHA hashes. And I would not use SHA for the first stage, but Argon2 or bcrypt. Save the salt and the difficulty on the database, but not the resulting hash. When the user tries to authenticate, get his password, re-hash it and send the hash to the second stage.

ThoriumBR
  • 51,983
  • 13
  • 131
  • 149
4

TLDR:

Yes, this approach can provide good protection.

Details

The weakness of MD5 is weak collision resistance. In your case this means that an attacker will be able to easily find some SHA-256 hash, different from the SHA-256 hash of the real password, that gives the same MD5 hash.

But knowing SHA-256 hash is not sufficient. For SHA-256 hash it is impossible to find neither the preimage (the real password) nor the second preimage (some other password that produces the same hash).

Thus the only way to attack is brute-forcing. The space of SHA-256 hashes is much bigger than the space of MD5 hashes. Computation of SHA-256 has the same order of magnitude as MD5, so it is not crucial and we can think of MD5 only. That's why finding preimage with brute-forcing of MD5(SHA-256()) may be comparable with brute-forcing of MD5, which needs abut ~2123 operations, which impossible to break now days.

Thus your approach is good for practical purposes.

An important assumption: This works for passwords that have high entropy. Passwords with low entropy like 40 bits (e.g. 8 chars that are low and upper case plus digits) can be brute-forced relatively fast. If a single GPU computes 108 hashes per second, it will brute-force 40-bit password just within a day.

mentallurg
  • 10,256
  • 5
  • 28
  • 44
1

What is it you are shielding against? password leaks, or collisions?

The insecurity of MD5 cannot downgrade the security of SHA256, in the sense that you cannot figure out what the password is better, just because you are using the MD5 hash. If this were the case, any attacker could attack any password protected by SHA256 by first putting the hash through MD5, then attack the new hash. This is obviously ridiculous.

But what about collisions? Can a SHA256 followed by MD5 increase the chance of collisions? I believe so, though I can't definitively prove it.

Suppose we have 4 input messages, and 4 possible digests. { 00, 01, 10, 11 }. Suppose our attacker only gets 1 guess, and we consider a system "secure" if we have a chance of not being cracked greater than 60% or something. Suppose we have an insecure hashing function f(x), which adds 1 if the input is odd. Even though the cardinality of the output space is 4, we only map to 2 of the options. Suppose we have a secure hashing function g(x) that always adds 1. (I understand neither of these are actually secure since the math is reversible, but bare with me).

g(x) maps to all of the output space. So if an attacker were to guess an input, given an output of g(x), they have a 25% chance of being right. but it's 50% for f(x). Now consider the two compositions f(g(x)) and g(f(x)).

  • f(g(x)) maps to two outputs, since f(x) is on the outside, so it's insecure.
  • g(f(x)) also only maps to two outputs, since f(x) can only pass two options to g(x) in the first place!

Either way, the composition of the two functions is not as secure against collisions as g(x) alone.

Now, this is contrived, and it might be the case that MD5 and SHA256 do not interact this way, but I doubt it. If you can find a collision for MD5, does this not indicate it isn't using it's output space as efficiently? So even though I can't prove the composition is insecure, I wouldn't trust it.

  • 2
    I'm sorry, but cryptography just isn't that simple. First of all, collision attacks have no impact whatsoever on password cracking. Preimage attacks do, but there is no practical publicly known preimage attack against MD5. Secondly, being vulnerable to collision attacks says nothing about how uniformly/efficiently the output space is used. (Not trusting it is fine though. You should never trust any crypto you do on your own, always check with someone knowledgeable first.) – nobody Feb 08 '22 at 22:22
  • @nobody, I didn't claim collisions proved the output space was inefficiently used, I claimed it indicated it. I do recognize this isn't a hard and fast proof, mainly because I don't know how MD5 was broken, just that it was. It might be the case that the outputs are very evenly distributed, but with a bit of math you can eliminate some portion of the pre-images to find a collision. Or it might be the case MD5 was broken simply by guess and check. I don't know. But if it's the latter, it does mean it's effective output space is smaller. Hence, it's indicative, but not a proof. – Joseph Summerhays Feb 09 '22 at 14:15
  • 1
    @JosephSummerhays The hash function `f(x) = sum(x) + 1` uses the output space perfectly, but it's fairly easy to perform a preimage attack on it. The hash function `g(x) = concat(SHA256(x), "cafed00d")` doesn't use its output space perfectly, but it's hard to crack. To my knowledge, nobody's shown that _either_ are true for MD5. (All hash functions have collisions.) – wizzwizz4 Feb 09 '22 at 14:36
  • @nobody, you are correct, and I did make a mistake in implying collision attacks could be used against passwords. They cannot. But being weak to collision attacks does imply a smaller output space, (if it is weak to random guesses, not weak to some other algorithm). – Joseph Summerhays Feb 09 '22 at 14:47
  • @wizzwizz4, yes, I recognize that using the output space efficiently is not the only requirement for a secure hash function, and I mention as much in my answer. Yes, all hash functions have collisions. But if it's easy to find a collision with just random guessing, this does imply the effective output space is smaller. I.E. if the birthday problem on an alien world reaches practical certainty at 500 guesses instead of our 40, how many days are in their year? more than in our year. So MD5 has a smaller effective output space than SHA256, but that doesn't imply you can do a preimage attack. – Joseph Summerhays Feb 09 '22 at 14:54
  • 1
    Except an MD5 collision _isn't_ just random guessing. – wizzwizz4 Feb 09 '22 at 15:01
  • @wizzwizz4, ok. Then I was wrong about it using it's output space less efficiently. Thanks for pointing that out, I've learned something new. But it does still have a smaller output space (128 bits instead of 256 bits), so the composition of the two still has a smaller effective output space, so both collisions and pre-images will be easier to compute (in the case of pre-images, still not easy enough to actually do, but easier all the same). – Joseph Summerhays Feb 09 '22 at 15:15
0

There is indeed a problem, if the number of possible hash values of the less secure hash is low. Assume the hash $h_1(x)$ is extremely secure, but $h_2(x)$ has only say $2^{32}$ possible values.

If the correct password is p, and I try about $2^{32}$ random passwords q, then I can expect to find one where $h_2(h_1(q)) = h_2(h_1(p))$. That doesn't mean I found the correct password, just that I found one that I can enter instead of the correct password, and which will work.

It takes the same amount of guesses to find q such that $h_1(h_2(q)) = h_1(h_2(p))$. However, for this case I would just find q such that $h_2(q) = h_2(p)$, and then $h_1(h_2(q)) = h_1(h_2(p))$ is automatically true. So a super secure hash either before or after a hash with few possible values is not secure.

Now that is the case for brute force attacks. If one hash can be cracked not by brute force but by some clever mathematics, then the second, more secure hash may be in the way.

PS. I wrote in some comment "If the middleware used ROT-13 for hashing, an attacker could recover the SHA-256 hash immediately". If we ignore the confusion about encryption vs. hashing, I can easily build a 256 bit hash roughly based on ROT-13, which would be awfully bad, but would have 2^256 possible outputs values so the argument above doesn't affect things badly. The worst hash which just splits the input into 256 bit groups and calculates an xor of those groups would be perfectly safe in this situation.

gnasher729
  • 2,107
  • 11
  • 16