11

I have read some times that hashing is a one way function, that is you can make the hash of a message, but you can't recover the original message from the hash, just check its integrity.

However, if this were true, why can we decrypt MD5 hashes and get the original data?

The Illusive Man
  • 10,587
  • 16
  • 58
  • 89
  • 13
    Since you posit MD5 is "decryptable", please recover the original data for this hash: `0cca9b3eeae7b8747eaf61f8d282156d`. I'll even give you a hint, it's a real md5 of 42 character ASCII string. PS, using the best public attacks on MD5 this should only take work 2**123 ~ 10,633,823,966,279,326,983,230,456,482,242,756,608 or at a billion hashes per second on a billion computers, it should only take about 170 billion years on average. – dr jimbob Jun 28 '13 at 18:22
  • 1
    @drjimbob: Assuming 42 *printable* ASCII characters, there are probably about 3.4e+44 such strings with that same MD5 hash. (95**42 strings divided by 2**128 possible MD5 hashes.) So your 170 billion years of work will give you a few hundred [tredecillion](http://en.wiktionary.org/wiki/tredecillion) possible answers, and no way to know which one is correct. If I were you, I'd almost consider not bothering. – Keith Thompson Jun 29 '13 at 01:00
  • @drjimbob I'm sure I can find a collision before that with rainbow tables. Used to be a time when Google had indexed at least one collision of most md5 hashes until the modified their search input to ignore hex strings of certain lengths. – ewanm89 Jun 29 '13 at 05:48
  • @ewanm89 - Want to bet? I'm willing to bet a $200 donation to any legitimate charity of loser's choice (say Cancer Research Institute - cancerresearch.org) if you can prove me wrong in the next month (I'd agree for longer terms if you need it, but risk forgetting about the wager). Full disclosure, I only vaguely remember the original string. So if you can generate an input (not necessarily my string) md5sums to that hash or this one `0113fd21d9ec4e367abb761b26ef6010` (also 42 ascii chars but I saved this string to disk). Or if you want we could do a straight-up bet via bitcoin. – dr jimbob Jun 29 '13 at 06:51
  • @ewanm89 - I take you couldn't find a collision with your MD5 rainbow tables. A rainbow table is just a time-memory tradeoff, every hash still has to be computed (plus apply a reduction function). Imagine a powerful adversary built a rainbow table with 2**80 MD5s. (If one MD5 takes 1 cpu-nanosecond this would require 38 billion CPU years, 10 billion computers for ~4 years, to construct). With this and a 128-bit MD5 of a high-entropy passphrase, the chance of one of my MD5s being broken is ~1 in 2**48 (281,474,976,710,656). Granted a stronger unpublished preimage attack on MD5 could exist. – dr jimbob Jul 01 '13 at 16:04
  • @KeithThomspon - I'd be satisfied with any input that matches (like a password that matches the md5 for login). Note, 96^42 ~ 10^83 ~ 2^277 >> 2^123 ~ 10^37. Brute-forcing all printable ASCII 42-char phrases would not take 170 billion years and give 10^44 possible answers. It would take 10^44 times longer than 170 billion years. Or it would take 7700 trillion trillion years using a trillion trillion computers that each test a trillion trillion MD5 hashes per second. A trillion trillion (10^24 aka septillion) is a very large number; e.g., Earth's mass is 6 trillion trillion kilograms. – dr jimbob Jul 18 '13 at 19:40

2 Answers2

46

Hashing is not encryption (it is hashing), so we do not "decrypt" MD5 hashes, since they were not "encrypted" in the first place.

Hashing is one-way, but deterministic: hash twice the same value, and you get twice the same output. So cracking a MD5 hash is about trying potential inputs (passwords) until a match is found. It works well when the input is "a password which a human user came up with" because human users are awfully unimaginative when it comes to choosing passwords.

Tom Leek
  • 170,038
  • 29
  • 342
  • 480
  • 2
    You might want to rewrite "hash twice the same value" which can be interpreted as hashing the hash of a value. – mowwwalker Jun 28 '13 at 23:10
  • 3
    @Walkerneo - Tom said _"hash twice the same value"_ not _"hash twice a value"_. It is not ambiguous by any stretch of imagination, unless you can find a hash of a certain input that is exactly the same to its input. Needless to say, if you have `h(v) = v` your `h` doesn't do its job correctly as a hash function. – TildalWave Jun 29 '13 at 13:21
  • @TildalWave: _technically_, a good hash function _h_ has probability about 63.2% of actually having a fixed point _v_, i.e. a value _v_ such that _h(v) = v_. We do expect, however, that it is not feasible to demonstrate the existence or inexistence of such a fixed point for any particular hash function, let alone compute it. (If we can compute the fixed point, then the function is definitely not a random oracle, which does not mathematically imply that it is broken as a hash function, but is still bad news in practice.) – Tom Leek Jul 02 '13 at 15:41
  • I suggest to clarify that cryptographic hashing can be believed to be a one way function, but this is not proven. No one-way function it is proven to exist (https://en.wikipedia.org/wiki/One-way_function). In addition, MD5 for sure is not a cryptographic hashing function because it's broken (https://en.wikipedia.org/wiki/MD5#Security): it is easy to find a collision so it's vulnerable to second pre-image attacks (at least) – infosec-guy Mar 03 '19 at 17:08
16

You cannot "decrypt" MD5.

What you can do is try to match a large number of possible inputs in the hopes of stumbling upon the input that matches your hash. There are several attacks against the MD5 algorithm that makes this significantly easier.