76

I have been reading about password management lately (very interesting stuff!) and was wondering how different the hashes would be for similar strings.

Is it possible to know if a password guess was close by comparing the resulting hash to the real hash?

For example, if the real password is "password123" and a hacker tries "Password123", "password1234", "password124", etc., would the generated hashes be similar enough to the real hash that either the hacker or their computer could tell they were on the right track?

Let's assume that the hacker knows any salt, pepper, cayenne powder, adobo, whatever... If they try the right password they will generate a matching hash.

(I think this might vary depending on the hash function used, but I don't know this for sure.)

elmer007
  • 849
  • 1
  • 6
  • 8
  • 52
    short answer: no - else rainbow tables would end up being *a lot* smaller – schroeder Sep 15 '16 at 20:52
  • 3
    The objective of the hash is to be as random as possible. So I hope no. – paparazzo Sep 17 '16 at 11:43
  • 4
    Has this question really never been asked before??? Just try and MD5 hash of a short string, change one letter, and find out for yourself! http://www.miraclesalad.com/webtools/md5.php – Buttle Butkus Sep 18 '16 at 01:55
  • @schroder are you confusing "should be not" with "no"? – emory Sep 19 '16 at 19:23
  • 2
    @emory are you confusing hash functions with *quote unquote hash functions*? – Dmitry Grigoryev Sep 20 '16 at 10:37
  • @DmitryGrigoryev I don't think so. The definition I use "A hash function is any function that can be used to map data of arbitrary size to data of fixed size" Some hash functions have properties that make them better for the password application than others, but modulus is a hash function. Do you have a different, better definition? – emory Sep 20 '16 at 14:32
  • @emory It's obvious from the context of the question that crypto hash functions are being discussed. – Dmitry Grigoryev Sep 20 '16 at 14:48

9 Answers9

161

No, you cannot determine how close you guessed looking at the hash. A hash function is designed with this in mind: one single changed bit on the input must change a lot of bits on the output. Its called Avalanche Effect.

Bellow are SHA1 hashes for some of your example passwords:

cbfdac6008f9cab4083784cbd1874f76618d2a97 - password123

b2e98ad6f6eb8508dd6a14cfa704bad7f05f6fb1 - Password123

2b4bfcc447c3c8726d26c22927a68f511d5e01cc - password124

115b55dcc1cd9a0dfdd60c120e83eaf658c45fc6 - right horse battery staple

abf7aad6438836dbe526aa231abde2d0eef74d42 - correct horse battery staple

A single bit change will completely change the hashing result. In fact, in an ideal case for every bit of input change every bit of output will be changed with a 50% probability.

ThoriumBR
  • 51,983
  • 13
  • 131
  • 149
  • 112
    It's _[correct horse battery staple](https://xkcd.com/936/)_ :) (SHA1 `abf7aad6438836dbe526aa231abde2d0eef74d42`) – Reeno Sep 16 '16 at 07:11
  • Comments are not for extended discussion; this conversation has been [moved to chat](http://chat.stackexchange.com/rooms/45501/discussion-on-answer-by-thoriumbr-can-one-tell-if-a-password-guess-was-close-by). – Rory Alsop Sep 17 '16 at 10:20
  • 44
    downvoted for trying to be clever with XKCD but then misquoting it. – theonlygusti Sep 17 '16 at 14:32
  • 32
    Well, right is pretty *close* to correct, the result cannot be that different – gcali Sep 17 '16 at 16:46
  • 16
    You should note this is true for *cryptographic* hashes. This is not true if you use, for example, Java's String.hashCode() method, or other general purpose hashing functions – Robert Fraser Sep 18 '16 at 05:45
  • 6
    @theonlygusti: Actually, changing one word to a synonym _is_ quite clever; certainly more clever than using a password to which everybody now knows the SHA1 hash. – Lightness Races in Orbit Sep 19 '16 at 10:27
  • @theonlygusti whom are you downvoting? Reeno? How do you downvote a comment? Just curious – Mawg says reinstate Monica Sep 19 '16 at 12:33
  • 1
    @Mawg The link Reeno posted shows the popular "correct horse battey staple" example. ThoriumBR made a reference to it in his answer but misquoted it as "right horse battery staple". So theonlygusti is downvoting, presumably jokingly, ThoriumBR for misquoting such a popular XKCD. – DasBeasto Sep 19 '16 at 12:41
  • OIC. Sorry for being so dumb. Thank you for explaining (+1) – Mawg says reinstate Monica Sep 19 '16 at 12:44
  • 2
    The SHA1 hashes given in this answer are those of the passwords plus a line change (LF) at the end. Thus they are completely different than hashes of the password strings without the extra line change. – JiK Sep 19 '16 at 15:09
  • 2
    @JiK It does not really matter. The point is to show that one single bit of the input changes the output completely. – ThoriumBR Sep 19 '16 at 15:10
  • Plus, good password hashes are salted. – Medinoc Sep 20 '16 at 12:35
18

No, thanks to avalanche effect, even a single bit change in input should a create a significant large difference in the output.

Also talking about hashing functions like md5 or others... Given any input size the output size will not change. In md5 if your input size is 2 bits or 2000 bits, your output will always be 32 digit in alpha numeric ( hexadecimal format). So it makes almost impossible for the normal user to guess the input size even if he has hashed md5 output, guessing password is way beyond the scope for normal user, still there are ways with high end parallel computing by applying algorithms and lots of permutations and combinations to figure out one of the possible input if you have the hashed code...!

md5 Output:
password123  = 482c811da5d5b4bc6d497ffa98491e38
Password123  = 42f749ade7f9e195bf475f37a44cafcb
password1234 = bdc87b9c894da5168059e00ebffb9077

Above results shows the avalanche effect.

Surendra Patil
  • 420
  • 2
  • 9
15

To add one additional element to the answers above - the whole point of a hash function is to not shed the sort of information in the question - if you could determine anything about the similarity of the inputs form comparing the outputs, that represents a failure of the hash function.

It's not just that hash functions perform some job and, incidentally, you can't figure out anything about the inputs, the job of the hash function is to give you zero information about the inputs, and if you could find out something about the inputs by comparing two outputs, then that would violate that principle.

Jason
  • 1,907
  • 2
  • 10
  • 15
7

It depends on which hash function you use. Since you specified a password hash, then unequivocally no, you would never want to use a hash function for passwords that leaked any sort of information. The hash should simply match or not match and that's it.

That being said, hash functions that provide similarity information do exist; more information is provided in this answer.

TTT
  • 9,132
  • 4
  • 19
  • 32
  • 1
    There is only one hash function that does not "leak any sort of information", and it maps all inputs to the same output. – N.I. Sep 17 '16 at 09:42
  • 6
    @NajibIdrissi That leaks information that there was an input. – user Sep 17 '16 at 13:47
  • @MichaelKjörling Does it? This isn't actually new information. My comment wasn't meant as a joke, FYI... – N.I. Sep 19 '16 at 13:53
  • 1
    @NajibIdrissi - I agree with you. Pedantically it may not be possible for a hash function to not leak *any* information AND also be useful for checking passwords. Though we can get close enough to make it practically equivalent. – TTT Sep 19 '16 at 14:21
  • @NajibIdrissi that particular function performs poorly when it comes to collisions though... (j/k) – Jeff Meden Sep 20 '16 at 14:29
  • @TTT "Pedantically"? Cryptography is an exact science. "Close enough" isn't "equal". – N.I. Sep 20 '16 at 15:27
  • @NajibIdrissi - I'm not sure what you're asking? As for close enough, I didn't say it was equal, only *practically* equal. As in it *practically* doesn't leak any information, even though *any* is an exaggeration (as you pointed out.) If that weren't true we wouldn't be able to safely use any hash function for passwords. – TTT Sep 20 '16 at 18:30
4

No. There shouldn't be any relation to the data that went into the hash function and what comes out.

For example, here is 'hello' in MD5:

5d41402abc4b2a76b9719d911017c592

And here is 'Hello':

8b1a9953c4611296a827abf8c47804d7
unor
  • 1,769
  • 1
  • 19
  • 39
Joshua Hunter
  • 557
  • 3
  • 4
  • 4
    Well, there _is_ necessarily a relation; the point is that it's not transparent both ways. – Anton Sherwood Sep 18 '16 at 05:08
  • @AntonSherwood, you are of course correct. There is a relationship between what goes in and what comes out. There has to be if you put the same thing in and expect the same thing to come out again every time. But as long as the hash function hasn't been compromised it shouldn't be possible to take what came out and determine what went in. So there is no **apparent** relation between the in and the out. – Joshua Hunter Sep 19 '16 at 15:10
2

This is a bit far fetched, but generally no - there is no way to determine "how correct" a password guess is from a hash value. The closest scenario I can think of would be when segments of the passphrase are being divided up and hashed separately, as with LMhash. This situation would allow the attacker to glean information from correct sections of the password, but would not allow the attacker to infer "how close" each segment was to being correct.

Other users have given a great overview of hashing and the avalanche effect, so I won't go into it!

Flutter
  • 21
  • 1
0

Depending on the algorithms used it may be possible to compute how close the password is and this leads to collision attacks that allow to crack the password way faster than brute force.

As I understand is the case of MD4 that can be cracked in milliseconds because of flaws like that. I did not analysed the MD4 exploit, but I bet that there is a proximity equation embedded on in it.

Lets take an idiotic example: a hash function that is the sum of the ASCII codes of the password. The password AB and BA will have the same hash 131. If the real password is AB the system will accept BA as the real password also. A little reverse engineering will show a relation: a) changing any letter to next one will add one in the hash; b) adding a new A to the password adds 65 to the hash. And then the crack is ready. The password returned by the crack may not be the same used by the user, but the system will not know the difference.

Yes the goal of a cryptographic hash algorithm is to prevent that. It supposed to be unpredictable an thus require brute force. But not always work as expected. Math is unforgiven.

EDIT: MD4, MD5 and SHA1 are obsolete by now because of these kinds of attacks. New algorithms was born to avoid those vulnerabilities. And password checking do a bit more than only hashing. The adoption of new algorithms are not as fast as we want. For constantly updated Linux, the base passwords should be using modern algorithms. But I bet there is a lot of systems in the world running more than 10 year old software because the new version of the operating system may not be compatible with older applications or simple by the licence costs if paid software is used. I bet many software that have its own password database separated of the system has weak security.

Lucas
  • 439
  • 3
  • 5
  • But nobody is using MD4, MD5 or even salted SHA1 these days, right? We're all using pbkdf2, bcrypt or script? As other posters have explained, even ancient unfavored MD5 produces wildly different outputs for inputs that differ by only 1 character, meaning that no, similar password attempts do not result in similar hashes. – Craig Tullis Sep 19 '16 at 07:53
  • Added a comment about that – Lucas Sep 19 '16 at 12:20
  • MD4 is a lot older than 10 years, as is MD5. MD5 was obsolete as of 1993, when SHA0 was published, and a component of MD5 was compromised in 1995. SHA1 was published in 1995, but a theoretical weakness was found in SHA1 in 2005, and as of 11 years ago SHA1 is no longer considered secure against well-funded opponents, so you want to be using SHA256 or SHA512, but not alone. Thus HMAC functions, Pbkdf2, bcrypt, scrypt, re all, which utilize hash functions, with entropy, as part of their algorithms, but are not JUST hash algorithms. – Craig Tullis Sep 19 '16 at 14:52
  • Craig, Elmer question was about hashes "assuming someone knows all salts, [...]" and another techiniques used. – Lucas Sep 19 '16 at 16:36
  • Elmer's question was about password management – Craig Tullis Sep 19 '16 at 16:38
0

Yes you can, but you have to store hashes for the near misses as well as associate them, so work and resources and possible security issue here and there. It is possible for instance even now to see the plaintext password "attempts" on a Wordpress site, if any of these where to match a known high entropy password from somewhere else we can tell where it may have been leaked from. There are of course algorithms for generating mis-keyed data, and this is widely known due to URL squatting on similar mispellings, however you would need to assign scores to determine how close the match might be and how close it needs to be before you are alerted. Essentially if these were dictionary words you might rank them by Soundex or similar.

mckenzm
  • 487
  • 2
  • 6
  • Can you explain this a bit more? I'm not getting how you would go about determining that two hashes have similar input. – Xiong Chiamiov Sep 19 '16 at 20:22
  • Similarity is determined prior to hashing. Given "disneyland" is a low entropy password, you may generate hashes for "disnyland" and "Disneyland" (among others). Barring collisions, you will have a list of hashes linked to the password, if one is hit, you can detect it. Your criteria for similarity is up to you, but you might store 100 near misses, by repeating or omitting characters from the original. You can't do this from the hash, you need the password at the time of capture. At enterprise level this can build a blacklist to prevent serial values such as "Summer01","Summer02" etc. – mckenzm Sep 20 '16 at 01:20
  • Credit for blacklisting idea but better to check for "enough different" when specifying a new password. Requiring user to (re)enter old password then gives you plaintext of both (besides, good practice to revalidate during change anyway). Also see Harper's answer and comments. The answer is still "no" for cryptographic hashes, the original question, since you are actually comparing against the _other_ hash (plus you also created the link between the hashes based on the unhashed/unencrypted values). – BillR Oct 01 '16 at 20:22
-1

Not by looking at the hash, no.

But they could brute-force a few thousand passwords that are near the entered password.

Presuming, of course, that they can bypass whatever defense a site has against someone making numerous attempts. Obviously, the site itself has this ability, say, if it wants to check whether a password was merely fat-fingered and not count it against their allowed number of tries.)

How many tries would it take for a password of n characters?

  • n to test for the user fatfingering an extra character.

  • (n+1)*95 to test for a missing character in any position.

  • n to test for the user making 1 character lowercase instead of upper or shifted (#) instead of unshifted (3).

  • n(n-1) to test for 2 characters lowercased/unshifted or vice versa.

  • <=6n to test for the user inadvertently typing an adjacent character instead of this character. (I.e. if it's say an F, substitute D R T G V C).

  • 12 to test for the one of the user's hands being offset one place from where it should be, plus a few more for characters in the middle of the keyboard (Y G H B) that might be hit by either hand.

As you can see, these numbers are not prohibitively large, and such analysis is feasible.

  • 3
    I miss the point of this answer. You aren't comparing a hash to the real hash to see how close it is, you are constructing a set of hashes based on similar inputs. This set might be useful for validating a user (close doesn't count against max tries before lockout) but would also help an attacker if the set leaked ("tried wrong value but your close!" or "only need to try one form of password but not variant). Example: 'password' also tests 'Password', 'password1", 'password2', 'passwords', 'passwordz' -- but not the classic caps lock 'PASSWORD' and 'pASSWORD'. – BillR Sep 19 '16 at 06:17
  • I see no reason whatsoever for the host to tell the hacker/user that he's close, it's useful internal intelligence only. If the hacker has the hashes and the salt, this also allows him to easily test passwords in the neighborhood of passwords he may already have, e.g. From hacking *other* systems. For the user this means minor alterations of password are no protection at all. – Harper - Reinstate Monica Sep 21 '16 at 00:56
  • If the cracker has the set of hashes (actual + close), then testing 1 possible hashed password eliminates all passwords that are in the set (if 1000 close, then compute 1 hash and compare it against 1001 hashes _without_ _recomputing_). I am making the assumption that the pwd db -- with all the extra "close" hashes -- leaked (but see next comment). – BillR Sep 24 '16 at 22:36
  • Alternate cracking scenario: assume cracker (me) tries to log in to a web page login that locks for 5 minutes after 3 wrong tries, for 1 day after 12, and requires reset after 24 but doesn't count "close enough". If I actually try any "close" password, that's almost like trying 1000 if I keep track of lockouts. If "opensesame" is close, then I only need test close-to-opensesame passwords. Even worse, **this is a godsend to anyone trying to extend a breach of one site to others via standard login procedure** given the rampant password reuse (exact or mostly similar). **I can try forever!** – BillR Sep 24 '16 at 23:01