So me and my friend were pondering why passwords have a max length (and found the answer here) and I had an odd thought.. could a password be the same as its hash?
I realize that salted hashes are based off RNG but is it possible? If so, what are the chances?
-
1Theoretically yes. depending on what characters are in the password and how it is encoded, it is more or less likely. Practical chances extremely low. – marstato Jan 16 '16 at 08:58
-
1That might depend on the hash algorithm you are using. – Philipp Jan 16 '16 at 11:07
-
I had a computer running for years trying to find one for MD5 - but with no luck. I never turned the machine on again after I moved. – Dog eat cat world Jan 18 '16 at 11:13
-
Hash algorithms produce fixed-length outputs. Therefore, hash algorithms produce a finite number of outputs. Ergo, given an infinite number of inputs of arbitrary length there will be some input that matches its own output. Ipso fatso. As a matter of implementation, this could be avoided by simply writing the hash function to return an error if the input matches the output. But the likelihood of this happening is on the scale of "number of atoms in universe" kind of things, so no one bothers. – Ryan Ries Jan 24 '16 at 00:55
3 Answers
In theory, yes. In order to prove this, all you need to do is hash every possible output hash, and see if any of them result in the same value as was input.
This might take a while.
In practice, very few people use passwords which are themselves valid outputs for hash functions, since this would rewrite remembering a 30+ character string comprised of digits and the letters A to F (assuming a hexadecimal representation). Furthermore, the use of salt in modern hashing algorithms would further confuse the issue - the password would have to be part of it's own hash, and the salt be the remaining part.
- 27,263
- 7
- 89
- 101
In the case of passwords that are hashed without the use of a salt it is likely but far from certain that there exist a password which will hash to itself. A formal proof whether it exists or not is not feasible, but one can do the calculations assuming a random oracle instead of a hash function. In that case the probability of such a password existing can be computed and will turn out to be approximately 63%. The full calculations can be seen in an answer on crypto SE.
Best practice however is to use a salt when hashing a password. That changes the calculations but also makes the question slightly ambiguous. There are at least two ways to interpret the question, each of which can be answered separately.
Does there exist a password that will always hash to itself regardless of which salt is chosen?
No.
The reason is that the salt is included in the password hash. And when we are asking for password and password hash to be identical, then the implication is that the password itself would have to contain the salt. And for any other choice of salt the password will not hash to itself.
Does there exist a password that for at least one choice of salt will hash to itself?
Most likely yes. The analysis is similar to the case of unsalted passwords. But due to the input having more entropy than the output, the numbers will come out very differently.
Let's look at the calculation using some concrete numbers that are widely used. Salt: 48 bits, Hash output: 512 bits.
This means there are 2⁵⁶⁰ possible inputs each of which have 1/2⁵¹² probability of matching. The probability that none of them matches will thus only be (1 - 1/2⁵¹²)^(2⁵⁶⁰) using the approximation from before saying that (1 - 1/2⁵¹²)^(2⁵¹²) ≃ 27% the probability that none of them matches thus comes out as 27%^(2⁴⁸) ≃ 0.
From time to time it comes up, how interesting would it be if the output of a hash, or salted hash was be the same as the input.
Hashing algorithms don't typically have any protection against this as part of the design criteria. In fact, it would be very easy to add one more step, stating that if the output matches (part of) the input, toggle the first bit in the matching part, thus preventing any such find.
Finding a matching input/output would certainly be fascinating to me, however, what makes it not all that interesting, is when you consider, which representation of the hash should match the input?
The input is typically a series of 8-bit bytes, either as text in various encodings, or raw binary.
The output is a fixed number of raw, 8-bit bytes, represented often as the big-endian hex representation of the n*8-bit values as a number.
It surely would be a curiosity if the output of the MD5 of a decimal (or binary) number in string format (i.e. "12345....") would be a hex number matching the exact same number, right?
But, if the output, interpreted as a large number in a binary format would be the exact same number, (when printed in the decimal format), most of us wouldn't even realise the equivalence. But it would be just as theoretically interesting.
Another example, it might be very suspicious, but would you notice that this (fake) MD5 output is actually in ASCII text?
48454c5021496d207472617070656421
If it was output as a Base64 encoding, it would be very difficult for most of us to see that fact.
- 771
- 6
- 10
-
Enigma was cracked because it prevented the input to have any byte in common with the output. There's no need to make the same error. – Benoit Esnard Jan 17 '16 at 00:04