2

I understand that for a hash function there are an infinite amount of possible inputs that could produce a specific output. So if you're trying to find a specific input that created an output that would be unfeasible.

However take SHA256, if you are given an output (digest), is it quite easy to calculate any one of the possible inputs if you don't mind which one?

Luke
  • 23
  • 2
  • I'm confused. If you don't care what the input is... then what are you trying to do? Can you give a specific example? – RoraΖ Feb 11 '15 at 17:54
  • @raz: I'm curious about the use of hash functions as a form of proof of work. Specifically hashcash (Adam Back). I can only see this being effective if you can't backward compute and find any random input. So I was trying to get an intuitive sense of what would prevent backwards computation, as I couldn't see **compression** as providing any resistance. – Luke Feb 12 '15 at 07:58

1 Answers1

6

No. Given some output x, any input m such that h(m) = x is called a preimage. Though there are a lot of possible preimage, finding any of them is unfeasible -- that is, if the hash function is indeed cryptographically strong.

The three classical properties of cryptographic hash functions are:

  • Resistance to preimages: given x, it is not feasible to find any value m such that h(m) = x.
  • Resistance to second preimages: given m, it is not feasible to find any value m' such that mm' and h(m') = h(m).
  • Resistance to collisions: it is not feasible to find m and m' such that mm' and h(m') = h(m).

In other words:

  • If I give you a hash output, you cannot find a matching input -- not only the one I used in the first place, but you cannot find any other as well.
  • If I give you a hash output and a matching input, you cannot find another, distinct input that yields the same output.
  • Even if I let you choose messages as you wish, without any constraint, you cannot find two distinct inputs that yield the same output.

Of course all these resistances are up to some (very high) computational effort. For instance, finding a preimage can be done with a method called "luck", i.e. trying random inputs until a match is found. If the hash function offers a 256-bit output, then luck will find a matching input in an average of 2256 tries, which is totally out of reach of existing and future technology (by a very very long margin). A function is said to be "cryptographically strong" if the hash function output size is such that luck is nowhere near practical, and there is no known better method.

RoraΖ
  • 12,347
  • 4
  • 51
  • 83
Thomas Pornin
  • 322,884
  • 58
  • 787
  • 955
  • You beat me to it, plus your answer is much better than mine. +1. – Craine Feb 11 '15 at 18:03
  • If you know that a hash was generated out of a specific set of possible messages then you could look for the input using a (prioritized) *dictionary attack*. For passwords and such *rainbow tables* may already exist (large, ordered 1:1 DB with input/hash values). – Maarten Bodewes Feb 11 '15 at 18:04
  • Thanks. So what is it that prevents backwards computation like that? I understand how compression like XOR can prevent you finding the specific input. But what prevents you calculating any random input? – Luke Feb 11 '15 at 18:18
  • @Luke: What actually makes hash functions secure is a rather deep question. You can try to read a lengthy answer there: http://security.stackexchange.com/questions/11717/why-are-hash-functions-one-way-if-i-know-the-algorithm-why-cant-i-calculate-t/19658#19658 – Thomas Pornin Feb 11 '15 at 19:02
  • @ThomasPornin: Wow thanks! "What makes MD5 hard to invert" really helped with grasping the intuition behind the toughness of backwards computation. So is SHA256 based on a similar process? Where **each message word enters the processing multiple times**? – Luke Feb 11 '15 at 22:26
  • MD5, SHA-1, SHA-256... all share the same basic structure. A nice pedagogical exercise is to get the specifications for these functions (freely accessible -- RFC 1321 for MD5, FIPS 180-4 for all SHA-*) and then implement them with some programming language (I would recommend Java or C# for that). This will teach you a lot on how these functions. – Thomas Pornin Feb 12 '15 at 00:44