10

I am a newbie to hashing, and I understand that MD5 (I know it's broken) and SHA-1 are all fixed hashing algorithms, but given that most passwords are dictionary words or other similar passwords, what's the point of storing it in a hash if an attacker can use Google to backtrack the original password?

I mean, isn't SHA-1 or SHA-2 or any of those algorithms rendered useless?

msamogh
  • 109
  • 1
  • 1
  • 3
  • 15
    The short answer is that those algorithms have perfectly sound cryptographic uses, but obscuring passwords is not one of them. – Stephen Touset Jan 20 '15 at 22:47
  • 3
    Because consistent output is necessary in confirming passwords when logging in... relying only on a hash is the weakness - salts, and prohibited word lists will help a lot - minimum word length, too. – HorusKol Jan 21 '15 at 02:23
  • 3
    A hashing function **should** return the same hash for the same string. Password salts solve the rainbow tables issue, which @Xander has explained well. – AKS Jan 21 '15 at 16:57
  • Hashing is used for many things besides passwords. When used to obscure passwords additional precautions must be taken to have a secure scheme. – Hot Licks Jan 21 '15 at 19:26
  • 1
    this is why your password should not be from a small search space (like a dictionary). it's only "rendered useless" if an attacker can be expected to be able to have computed all the hashes from your search space. – Dave Cousineau Jan 21 '15 at 23:11

5 Answers5

33

This is why you need to use a salt.

Adding a globally unique salt to each password ensures that even if every password is the same, they will still each generate unique hash values.

That said, a single iteration of SHA-1 or SHA-2 is still not a good way to hash passwords, because they are too fast. Even with a unique salt, modern hardware is capable of hashing billions of candidate passwords each second to find the inputs. A good slow algorithm such as bcrypt with an appropriate work factor should be used instead.

Xander
  • 35,616
  • 27
  • 114
  • 141
11

Password hashing is but one, very small use for a hash. Hashes can be very valuable for other uses by themselves, in constructs that pair hashes with keys, and in other constructs.

For password hashing in specific, as always, you should read Thomas Pornin's canonical answer to How to securely hash passwords.

For this answer, the summary is: In the event of a password database leak, sufficiently bad password storage (plaintext and/or reversible encryption where the key is known to the attacker) renders the best password worthless. Sufficiently bad passwords render the best password storage worthless. At this level, nothing is safe, because one or both parties failed to reach the minimum required bar for any kind of safety.

Then we get to "really horribly awful password storage" like unsalted single iterations of hashes. This is just like the below, but each try applies to every single password at once. At this level, only unique, insane passwords are safe, as two users with the same password are obvious - they have the same hash.

Then we get to "really awful password storage" like unique per-row salted single iterations of hashes. Of any hashes, though obviously slower (to the attacker) hashes have a smaller range of passwords that are worthless after a given amount of time spent trying guesses.

  • All the per-row unique salt does is force each candidate password to be re-hashed for each password hash
    • And prevent precomputed hashes from being worth much.
  • Assuming January 2015 oclHashcat speeds for a single machine with 8 R9 290X's
    • 2.4E17, or about 2^57.7 single MD5 tries per 30 days
      • More or less 4 words chosen cryptographically randomly from all words of length 7 or less in Ubuntu's American English Small dictionary, i.e. roughly 21000^4 or 1.9E17, or 2^57.4, is this keyspace.
      • Yes, "correcthorsebatterystaple" is in this set. See my answer to Should I reject obviously poor passwords for further math.
    • 1.2E16, or about 2^53.4 single SHA-512 tries per 30 days
      • Nine character cryptographically random passwords with Upper, Lower, and number characters are about this keyspace (62^9, or 1.35E16, or 2^53.6).
  • Insanely strong passwords are safe here.
    • 35 character cryptographically random passwords of upper, lower, and numbers are roughly as safe as 192 bit encryption, at 62^35, or 5.4E62 or 2^208.4 possibilities.
  • Strong passwords are probably ok
    • 20 character cryptographically random passwords of upper, lower, number, and US keyboard symbols are roughly as safe as 128 bit encryption, at 94^20, or 2.9E39, or 2^131.1 possibilities.
    • 14 character cryptographically random passwords of upper, lower, and number are about the minimum I'd call "maybe safeish for a few years, but not for decades" at this level, with 62^14, or 1.2E25, or 2^83.4 possibilities.

After this is using good, solid algorithms, i.e. PBKDF2, BCrypt, or SCrypt, with low iteration counts. This is where hashing is safe for reasonably complex passwords!

  • Remember, PBKDF2/RFC2898 is merely a construct for iterating an HMAC many times.
  • Remember, HMAC/RFC2104 is merely a construct for using a hash (like SHA-1, or MD5, or SHA-512, or Whirlpool, or ...) with both a key and some data
  • Assuming January 2015 oclHashcat speeds for a single machine with 8 R9 290X's
    • 2.9E12, or about 2^41, WPA/WPA2 tries per 30 days
      • 8 character cryptographically random passwords of just a letter and number (36^8, or 2.8E12) are close to this.
      • 3 cryptographically randomly chosen words of length 7 or less in Ubuntu's American English Small exceed this, 21000^3, or 9.2E12
      • Cryptographically random patterns of one uppercase, 7 lowercase, and one number are in this keyspace, 26*26^7*10, or 2E12

After this is using PBKDF2, BCrypt, or SCrypt with a sufficiently high number of iterations/work factor. At this level, hashing is safe for slightly less complex passwords, and/or for a slightly longer time.

Note that, amusingly, PBKDF2-HMAC-MD5 is, with a sufficiently high iteration count, actually not broken, though you'd either have to be an idiot to use it, or you'd have to be working with something like an FPGA with enough gates for MD5 but not for SHA-1 or SHA-2.

Note that attackers always gain more power... and that password leaks stay in the wild forever. All passwords that get found should never be considered safe again - the largest wordlist I've seen contains gigabytes of previously cracked or found in plaintext passwords, and some are horrifically complex... but they were stored badly, and are now in wordlists.

Rules based attacks using dictionaries are a topic for another question :).

Anti-weakpasswords
  • 9,850
  • 2
  • 24
  • 52
6

SHA-1 etc. are not designed for passwords.

They are designed for signing documents (with known text) against modifications such as adding an extra 000 to your paycheck. When it is hard to find a random document that has the same hash, then it will be much more harder to find one that makes sense and achieves the desired effect!

One of the important properties of cryptographic hashes (as opposed to hashes used e.g. for data structures) is that a single bit changed in the input should generate an entirely different hash, in a way that you can't predict how to undo this.

For the purpose of encrypting passwords this would not be this necessary (you could use very long salts to hide too-similar passwords in the digests; but it is of course nice to have this property, so that say passw0rd and password generate totally different results)

Think outside the password box.

These hashes can be used for passwords if you add in a salt (always use salts) and do enough iterations to make brute force costly. But that is not what their design goal is. The design goal is an unpredictable mixing of the bits, so that every bit changes about 50% of the output.

  • 2
    Hashes are not designed for the purposes of signing, as they can trivially be forged. Constructs like [HMAC](https://en.wikipedia.org/wiki/HMAC) *use* hashes to generate an [authenticator](https://en.wikipedia.org/wiki/Message_authentication_code) which proves that a message hasn't been tampered with by a party not possessing the shared secret. [Digital signatures](https://en.wikipedia.org/wiki/Digital_signature) are the asymmetric equivalent of this, and produce authenticators that prove that a message must have been signed by an entity possessing a particular private key. – Stephen Touset Jan 21 '15 at 01:33
  • 6
    Hashes aren't generally designed for _any_ particular high-level use-case: not signing, not password storage, not message authentication, not anything. They're building blocks, and you use them to create other, higher-level protocols that achieve higher-level goals. – cpast Jan 21 '15 at 03:24
  • @StephenTouset A hash on its own is of course not a signature, but a hash is the most important building block to constructing a signature. In fact it is possible to construct a public key signature scheme using no other building block than a cryptographic hash. However if you combine a hash and an asymmetric primitive such as RSA, much more efficient signatures are possible. – kasperd Jan 21 '15 at 08:20
2

It's NOT the hashing's fault, it's the users'

most passwords are dictionary words or other similar passwords

My password is

iL0v3$t4ckExch4n9everymuch<3 xoxoxox

Now use Google to find the hash for my password under MD5 or SHA-1, I highly doubt that it's pre-calculated in any rainbow table. In fact I was a member of a website that got hacked and the hacker has posted the unsalted MD5 hash of my password online. Till today it's available and I used the same password and email on an other website deliberately to see if someone will ever bother to create a rainbow table that reveals my 26-letters password.

Ulkoma
  • 8,793
  • 16
  • 66
  • 95
  • 10
    If it wasn't in any rainbow tables before, it definitely will be now. – Stephen Touset Jan 20 '15 at 22:54
  • It is absolutely true that the choice of password matters. Whether the user choose a strong or a weak password means more to security than which hash function is used. In other words, a strong password hashed with a single invocation of MD5 with no salt is more secure than a weak password hashed with the most secure password hashing in existence. One should still use a salt though, since the use of a salt is practically free and provides a major security improvement. – kasperd Jan 21 '15 at 00:06
  • 4
    I'll assume that's not your actual password. – user253751 Jan 21 '15 at 00:09
  • 1
    The user should be the one concerned about his security, many high profile websites turned out to use unsalted hashes. A salt is a must now-days but bad developers are everywhere. – Ulkoma Jan 21 '15 at 00:09
  • @immibis give it a try – Ulkoma Jan 21 '15 at 00:11
  • 1
    This doesn't really answer the question, where you're right about the password not (yet) being in any rainbow tables, uniqueness of hashes shouldn't rely on the password's complexity but on the fact that a salt is used. –  Jan 21 '15 at 02:59
0

You have to differ two things:

A) The hash function as a function to transform a given set of information input to a well defined set of output. If you look at this aspect of a hashing function, it has to solve two problems:

1) It should be designed to avoid collisions - it would be a mess, if you have a high collision rate because it would be easier to "guess" the password. Think of a function which sums up every ASCII-Position of every char input: the outcome for »ab« is the same as for »ba«. So this would be really a very bad choice.

2) It should be deterministic so that you could reproduce the result. Hashes are used for comparison. So it should be clear, that a function, which has no deterministic output is useless.

B) The use of a hashing-algorithm in the context of storing passwords in a database.

How is hashing safe if a given string always generates the same hash?

From (A) it is clear, that, since the output is deterministic a possible attack vector would be to use premade/-calculated hashes aka Rainbow-Tables.

simple example:

Say, your password is "hello world". The MD5-Sum is: 6f5902ac237024bdd0c176cb93063dc4. Googling (our »Rainbow table« for the sake of the example) for that checksum leads you to this location, where you find a file, whose content is "hello world". So simply hashing isn't enough. A way to improve security is to use a salt (which could be stored alongside the hash). In our case, let's add 6f5902ac237024bdd0c176cb93063dc4 (the MD5-sum of the previous example) to the passphrase => »hello world6f5902ac237024bdd0c176cb93063dc4«. The new MD5 is f59c64e092edb4fbc2904a3b28ce88c5. If you google that, you have no results.

Beware: I am not(!) suggesting, using MD5 in this way is in any way "secure" - but I used it to exemplify how a salt affects the outcome of a hashing function. Although the hash is calculated deterministically, the output for one password and two different salts is different enough to make the "guessing"-part a bit harder.

Thomas Junk
  • 188
  • 7