6

openssl_digest vs hash vs hash_hmac

I want to use SHA512 for storing password. Which of the above methods are best to use? Why?


What is the difference between SALT & HMAC?


I just read that HMAC is built on top of hash function.

So is SHA512+SALT+HMAC really necessary or SHA512+SALT or SHA512+HMAC?


ThinkingMonkey
  • 163
  • 1
  • 5

2 Answers2

8

Short answer: use bcrypt.


For storing passwords, you actually do not want to store the password itself, only "something" which is sufficient to verify a password. This "something" should be the result of a deterministic function applied to the password. Preferably, the function shall not be reversible (because that's the point of not storing the password itself: preventing an attacker, who managed to obtain a read-only access to the database of a server, from immediately computing all the users' passwords). By the same reasoning, the password processing function should strive to make password guessing uneasy. "Password guessing" is trying potential passwords until a match is found (it's called a dictionary attack).

To make dictionary attacks difficult, the password processing function should have three characteristics:

  • It should be slow.
  • It should be distinct for each password instance.
  • The hardware which optimally computes the function should be exactly the hardware which will run it under normal conditions.

Slowness is about making each "guess" as expensive as possible for the attacker. This is usually done by iterating a one-way function; the number of iterations should be counted in thousands or even millions. Standard hash functions are very fast; a plain PC can evaluate SHA-256 or SHA-512 millions of times per second. We do not want an attacker to be able to try millions of passwords per second on a plain PC. Hence, we use many iterations. Of course, this makes password verification slower for us too -- but we do not have millions of passwords to verify per second, so we can tolerate a quite high iteration count.

Making the function distinct for each instance is what the salt is about. The salt is a parameter which alters the processing of the password (let's say that it is "hashed together with the password"). The salt is stored with the hashed password. The salt is chosen randomly when the user chooses his password; a new salt is generated when the user changes his password. The role of the salt is to make password processing different each time. This prevents parallel attacks. An attacker who got a copy of the server database may want to attack 1000 accounts simultaneously, by hashing a potential password and see if the result matches any of the 1000 stored passwords; this is a parallel attack and the salt defeats that. A precomputed table of hashed password, where a given stored hash password could be simply looked up, is also a kind of parallel attack, and the salt is still effective against that. Salting is good. Salts work optimally when no salt value is ever repeated; a big enough random salt ensures that with high probability.

Optimality of hardware is about making the attack expensive for the attacker. We have a server (i.e., a PC) which spends as much CPU as is tolerable on password hashing (the number of iterations is configured that way). This scheme would be weakened if the attacker could run his dictionary attack in a more cost-effective way by using another kind of hardware, in particular a GPU. A 500$ GPU will usually be able to try 10 or 20 times as many passwords than a 500$ CPU, if the password processing function maps well to the kind of operations that GPU handle well. Usual hash functions such as SHA-256 map uncannily well to GPU abilities. So, by using such a function as the basis for a password processing function, you are basically giving a 20x advantage to the attacker.

Choosing a one-way function which does not run well on a GPU, iterating it many times without compromising the one-wayness, and adding a salt, is very difficult to do correctly: security is a subtle thing, and cannot be tested, save by hiring a bunch of cryptographers and letting them work a few months on the subject. Even then, the cryptographers will not give you any answer more decisive than "we are the best of the best at what we do, and yet we did not find any weakness... for the time being". You are strongly advised not to invent your own scheme; that's the first and foremost lesson in such matters: handmade is bad.

Instead, choose a function which has been around, widely used, and not broken yet, for a decade. Fortunately, such a function exists, it is called bcrypt. If you work in a field which is inordinately concerned with administrative approval, you may want to use PBKDF2 instead; but PBKDF2 runs with a standard hash function, which means that a GPU-wielding attacker gets a bonus, which he will not have with bcrypt, which is deliberately hostile to GPU (see this previous answer for details).


HMAC has nothing to do with all that. It is a Message Authentication Code, aka a kind of "keyed checksum" used to verify integrity of a piece of data. HMAC uses a key, so it involves key management, which has never been a simple thing. Some people have advocated storing such keyed checksums as password verification tokens, the idea being that the attacker will not be able to "try passwords" even if he has a copy of the tokens, because they are useless without the key. But this assumes that the attacker could not get a copy of the key, even though he could access a dump of the server database. Therefore, practical applicability of this model is questionable. Key management is a can of worm; avoid it if you can.

HMAC is defined as operating with a hash function (any hash function); it invokes that function twice in order to be a proper, secure MAC, even with existing hash functions which have a few shortcomings.

Thomas Pornin
  • 322,884
  • 58
  • 787
  • 955
1

I suspect that your question will be best answered by a definition of terms:

  • A hash, in this context, is a one-way function - i.e. a function that makes it very easy to find the result from the argument (the password) but difficult (or impossible) to find any argument that generates a given result.
  • A salt is some auxiliary data that augments the argument to a hash function. This is useful as it prevents accidental discovery of passwords through observation that two hashed passwords have identical values. With a salt, the stored/transmitted value will only be identical if both the salt and the password match.
  • An HMAC refers to the application of a hash (and optional salt) to a "message authentication code" - which, depending upon context might be a password... or, at least, there's nothing stopping you passing a password into the HMAC as if it were the message authentication code.

I think it's a very interesting question whether there's any point in applying multiple hashes... it may even be an open research problem. I posted an enquiry about this idea here. The short summary is there is probably little advantage in applying multiple hash functions to the same data - though, assuming the hash function is robust, likely no ill consequence (save an overhead of processing.)

Personally I'd use a salt and a hash function... and I wouldn't be paranoid about the strength of the hash function as its unlikely to be the weak link in any practical system....

aSteve
  • 111
  • 1
  • Thanks. This was a very nice explanation. I am thinking of settling with SHA512 + a random salt. –  Jan 21 '12 at 19:22
  • I heard doing a multiple hash sometimes was good against dictionary attacks on the hashes, should the hashes be found. Although a salt might do that as well. – Lucas Kauffman Jan 21 '12 at 22:52
  • I think it is very debatable if hashing multiple times is ever good idea. Multiple hashes with independent salts provide a sequence of values that are hard/impossible to compute without the salts... which is useful in some situations. Applying two different hashes successively is a bad idea as the 'size' of the resulting combined function is limited to be <= the minimum of the two originals. Deciding how much more simple (to reverse-engineer) the combined function is than the simplest original is very hard to prove... for the best hash functions, the answer should be "not much." – aSteve Feb 04 '12 at 16:28