10

Let's say that my password is a single character: "a".

Couldn't I chain hash it 1000 (or more) times and make it nearly invulnerable to rainbow table attacks and brute force?

Why isn't this preferred to salting and what are the problems with this technique, if any?

EDIT: Clarification: By chain hash I mean hashing the hash 1000 times.

dee-see
  • 203
  • 2
  • 8
  • 1
    Why hash something 1000 times or more when you can salt it? –  Feb 01 '12 at 01:46
  • 3
    @acidzombie24 Because a salt is complementary to iterating hashing (or chaining). You shouldn't do either one or the other, you should do both. Or best of all, you don't mess around with security but use something proven like bcrypt or PBKDF2. – Luc Oct 24 '12 at 17:56

5 Answers5

19

To strengthen a password hash, you need to do two things to the hashing process:

  1. to make it unique;
  2. to make it slow.

"Unique" means that each password should be hashed with its own hash function; which hash function is used can be some public information (i.e. stored along the hash value), but you want to make it different for each password you hash. This is what a salt does: the salt defines the hash procedure to use, among a wide family. Uniqueness defeats rainbow tables: rainbow tables, like all other kinds of precomputed tables, relies on the idea that it is worthwhile to spend some time hashing a lot of passwords and storing the hash values (possibly in a smart way which allows for some extreme compression, like rainbow tables), because the resulting table can be used to attack several passwords, with a very small per-password marginal cost. With salts, each password has its own function, so the table would be good for at most one password only, which destroys its advantages.

Hashing "1000 (or more) times" the password does not include a salt, and, as such, is vulnerable to precomputed tables. For instance, assuming SHA-256 as hash function, here is your hashed password:

f3f19029aa4ef4bde723f49b4e90a7ad51473c54a03589af6fef706bf50d7894

That's the SHA-256 hash of 1000 'a' characters. I could precompute that value because once you have said "1000" you have said it all; no salt, hence no surprise for the attacker.

Slowness is about making each password guess as expensive as possible for the attacker. Even with good salting, an individual hashed password can be vulnerable to brute force, aka "dictionary attack" (trying potential passwords), because humans are not nearly as imaginative as computers are powerful. A PC with a GPU can compute a hash function a billion times per second or so. We want a hashing procedure which takes more time to compute -- not too much, because our honest server will also hash passwords when a user logs in, and we do not have infinite CPU either; but we just need to hash, at most, a dozen passwords per second, so we can tolerate a substantially slow hash function.

Usually, slowness is obtained by mandating nested hashing: we hash the password, then we hash the resulting hash value, which we hash again, and so on. There are a few tricky details about how and where the salt is inserted. Hashing the concatenation of 1000 times (actually, with the figures above, 1 million times would be better) the password (or the concatenation of the password and the salt) could serve the same purpose, but it is a bit delicate in practice: indeed, we want to configure the number of repetitions of the password so that the procedure is tolerably slow. But with such a system, hashing a 40-character password takes 40 times the time of hashing a 1-character password; if the latter must be slow, the former will be 40 times as slow, which soon becomes intolerable. With nested hashing, it is easier to get constant hashing time, which makes configuration easier.

And, of course, homemade is Bad. Inserting a password and a salt in a hash function, iterated and/or nested, is subtle; there are pitfalls, and, worst of all, you cannot know whether you did it wrong or not. Security cannot be reliably tested. So, you should stick to published and widely deployed standards of good repute, such as bcrypt and PBKDF2. If only because it means that any mishap will not be your fault.

Tom Leek
  • 170,038
  • 29
  • 342
  • 480
  • 2
    Interesting post! But nested hashing is exactly what I meant by "chain hashing" (got the term from [wiki](http://en.wikipedia.org/wiki/Hash_chain) !), I didn't mean concatenating (English isn't my first language, might have expressed myself poorly). I felt that since the number of times I nest my hash is hard coded somewhere, it would be really hard to compromise assuming "only" my database has been leaked and not my code. – dee-see Jan 31 '12 at 22:25
  • @domsterr: your code has leaked. I mean: secrecy of code is very hard to achieve; you have copies of it in several places (servers, development machines, backups...). Also, it is hard to _quantify_ the entropy inherent to a piece of code (i.e. how many distinct code sequences you could have used). It is best to assume that the attacker knows the code. – Tom Leek Feb 01 '12 at 12:39
5

Key strengthening/stretching is a good technique (though a 1000 doesn't buy you much when simple hashes - md5/sha1 can be calculated at a rate of ~ 1 billion per second per gpu).

For the sake of simplicity, I'm going to give you a concrete example. You have a web app and I'm going to assume the malicious adversary managed to get access to your server. This could be an admin who's secretly evil, someone who broke in to your server room, a janitor who has keys, or a random hacker who managed to use some exploit to get in.

The adversary first manages to export the stored usernames and their hashes of passwords from your database. Then they look at the running code of your web server for how it checks passwords. It sees something in the authentication section of your web server code that looks like:

def check_password(username, password_to_check):
    if is_username_in_db(username):
        already_failed = False
        hash_from_db = get_password_hash_from_db(username)
    else:
        already_failed = True
        hash_from_db = get_password_hash_from_db('some_known_user')
    i = 0
    computed_hash = password_to_check
    while i < 1000:
        computed_hash = compute_sha1_hash(computed_hash)
        i += 1
    if strcmp(computed_hash, hash_from_db) and not already_failed:
        return True
    else:
        return False 
 # As an aside note a successful case takes the same amount of cpu time as a failing case, 
 # so timing attacks do not reveal that users exist; 
 # you should also make sure that `strcmp` (string comparison) doesn't fail prematurely.

The attackers say aha! the passwords in the db are hashed 1000 times and unsalted. They then generate hashes at a rate of a billion per second (so can check a million passwords in a second) with their modern gpu. So in about 1 day of gpu computational time, they have gotten hashes for 86.4 billion passwords (36 bits of entropy) and constructed a rainbow table on their terabyte hard disk.

They then look up all the hashes in the rainbow table and find many matches; now with the plaintext password and the user name, they try breaking into other services of those users where they may have reused passwords. If the passwords were uniquely salted, they would have to generate a new rainbow table for each password making the brute-force attack much less successful (e.g., one day or so of GPU time to brute force each simple password).

dr jimbob
  • 38,936
  • 8
  • 92
  • 162
1

Because it's not invulnerable to brute forcing. To find the password along the lines of 'a' hashed a thousand times all I have to do is try 52000 hashes (or considerably less if the password is actually 'a' because I just start at the beginning of the alphabet), which is terribly easy to do.

Steve
  • 15,215
  • 3
  • 38
  • 66
1

If it's a simple dictionary word, the function would have to be really slow in order to stop any non-trivial attack. Every decent program is going to try the popular passwords/short passwords/simple patterns first. Then you have all the pre-computed problems that others have mentioned above. Take a look at the FreeBSD MD5 scheme as a decent example of how to take a crappy old MD5 and make it quite resilient to cracking.

Marcin
  • 2,528
  • 1
  • 16
  • 14
1

Chain hashing doesn't protect you from the use of rainbow tables.

Arjan Einbu
  • 342
  • 4
  • 13