14

Let's say we have a separate hashing algorithm called s2 and it would convert Hello into dug84nd8.

If we could take the algorithm and just reverse engineer it to generate a string like 8GN492MD that would also output dug84nd8, wouldn't it say that (s2("Hello") = s2("8GN492MD")) == true), and let a hacker in?

I feel like that I am missing something, but I don't know what it is.

Peter Mortensen
  • 885
  • 5
  • 10
xyper
  • 173
  • 1
  • 4
  • 5
    Note that this is not referred to as "decryption" because you don't get back the original string - just another string that hashes to the same value. Unlike encryption, hashes can't be reversed – Conor Mancone Aug 11 '20 at 21:36
  • "I feel like I am missing something'... Watch this video, and you'll see what you're missing: https://www.youtube.com/watch?v=y3dqhixzGVo – mti2935 Aug 12 '20 at 03:13
  • 13
    Does this answer your question? [Why are hash functions one way? If I know the algorithm, why can't I calculate the input from it?](https://security.stackexchange.com/questions/11717/why-are-hash-functions-one-way-if-i-know-the-algorithm-why-cant-i-calculate-t) – Gilles 'SO- stop being evil' Aug 12 '20 at 09:14
  • 2
    Note that for weaker password hashing algorithms it's a frequent occurrence that a password hash cracking tool bruteforces a hash and finds a password which was not the original password, but since it has the same hash it will be accepted. However with stronger/longer hashes this is not seen as you cannot run a exhaustive search and the probability of bruteforcing a second pre-image is very low. – eckes Aug 12 '20 at 09:36
  • @Gilles'SO-stopbeingevil' not a dupe of that one. The OP is not trying to calculate the original input. – schroeder Aug 12 '20 at 10:29
  • @schroeder The OP is trying to calculate a preimage, which is exactly the same as the earlier question. – Gilles 'SO- stop being evil' Aug 12 '20 at 10:49
  • @Gilles'SO-stopbeingevil' From the proposed dupe: "can't it just reverse the process to calculate the password from the hash?" That appears to be a different attack and a different question. This question is about 2 inputs with the same hash, not calculating `Hello` from `dug84nd8` – schroeder Aug 12 '20 at 11:47
  • 3
    Yes, you can do that. Have you tried? If you try, you might find some weird obstacles in your way. Now consider that nobody has been able to find a way around those obstacles yet. They did for older hash functions like MD5, which is why we don't use old hash functions any more. – user253751 Aug 12 '20 at 14:08
  • Of course you could. – user91988 Aug 12 '20 at 20:25
  • 2
    Wouldn't the answer (yes, which is why you need to pick a good algorithm) be on the first page of any tutorial on hashing? – Jason Goemaat Aug 13 '20 at 01:19
  • Yes. This is called a hash collision. It's intrinsic to hashing. Good hash algorithms will make it extremely difficult to cause a collision. – Layne Bernardo Aug 15 '20 at 01:22

13 Answers13

73

Your premise has a flaw. You say you want to 'reverse engineer' the hash function. There's no need to reverse engineer it - its implementation is public.

What you can't do is invert it (perhaps that's what you meant), because it's not invertible. You can easily tell it's not an invertible function because the size of the domain (possible number of inputs) is greater than the size of the range (possible number of outputs). The range is 2^256 (possible output states) and the size of the input space is infinite (technically 2^(2^64) apparently, but much larger than 2^256). And that's precisely what permits the collisions (by the pigeon hole principle, there must be more than one possible input for each output - at least for one of the inputs).

The hash function's whole design makes it computationally hard to find those collisions. There are three properties of hashes (first pre-image resistance, second pre-image resistance and collision resistance) which describe that property more exactly.

So the answer to your question is that the design of the function makes it purposely hard to achieve that even if you know exactly how the function works.

For details (in a slightly different context) of how functions can perform surprisingly (why for instance it is impossible to "step backwards through them" to invert them), see the answers here.

abligh
  • 2,036
  • 12
  • 12
  • 4
    _The range is 2^256 and the size of the input space is infinite_ If the input space is infinite there is an infinite number of collisions. ∞/2²⁵⁶ = ∞... – d-b Aug 12 '20 at 08:18
  • 7
    @d-b yes, but technically there need not be an infinite (or even more than one) collision for each output value; it could be the case that a subset of output values have only one possible input values or none at all. I suspect it is not possible to prove whether there is an infinite number of collisions for each output value for SHA-256 though I suspect it is the case. – abligh Aug 12 '20 at 10:32
  • 4
    The size of the input space is not quite infinite, it is 2^(2^64) since the size of the input is appended before taking the hash. Not that you are likely to ever have a bigger input than that. – rtpax Aug 12 '20 at 12:42
  • @d-b and yet, [collisions don't really happen](https://crypto.stackexchange.com/a/47810/62225). – Captain Man Aug 12 '20 at 14:03
  • 2
    @CaptainMan That is because 2^256 ≈ the number of atoms in the visible universe. – d-b Aug 12 '20 at 17:46
  • 19
    _"it's not an invertible function because the size of the domain is greater than the size of the range"_ - You're talking about [the **mathematical** definition of one way functions](https://en.wikipedia.org/wiki/Injective_function), but that's not what's relevant here. The [**cryptographic** definition](https://en.wikipedia.org/wiki/One-way_function) is _"a function which is computationally infeasible to invert"_. The fact that the inverse is a multivalued function irrelevant. – BlueRaja - Danny Pflughoeft Aug 12 '20 at 20:24
  • @rtpax IIRC, it is the _bit_ count up to 2^(2^64). So the input space is _only_ 2^(2^61) _bytes_. Whew, that should help a _bit_. – chux - Reinstate Monica Aug 13 '20 at 03:34
  • First, the SHA256 input space is limited due to the padding standard of NIST. One can hash at most 2^64 bits therefore there are at most 2^(2^64) different messages. 2, You are incorrect about the pigeonhole principle. It only says that there is at least one pigeonhole that has more than one pigeon. It doesn't talk about others that can be empty or not. With this, we can say that at least there must be one collision. Of course, ideally, we expect that SHA256 is a uniform random function. – kelalaka Aug 13 '20 at 21:43
  • @kelalaka fixed. – abligh Aug 14 '20 at 06:18
  • I don't see that changes are made different. Also, even we don't know that SHA256 attends all the values if we trim the output of SHA256 into 64 bits. What we know its output seem uniform random. – kelalaka Aug 14 '20 at 07:17
20

You are mixing the attacks on the hash functions. The formal definition of generic attacks on cryptographic hash functions can be found at Cryptographic Hash-Function Basics: Definitions, Implications, and Separations for Preimage Resistance, Second-Preimage Resistance, and Collision Resistance by P. Rogaway and T. Shrimpton. Simply can be given as;

  • The Pre-image attack: given a hash value h, find a message m such that h=Hash(m). Consider storing the hashes of passwords on the server. E.g. an attacker will try to find a valid password to your account.

  • The Second Pre-image attack (also called weak-collision): given a message m1, find another message m2 such that m1≠m2 and Hash(m1)=Hash(m2). An example is producing a forgery of a given message.

  • The Collision attack (also called strong-collision): Find two inputs that hash to the same output: a and b such that H(a)=H(b), a≠b.

SHA-256 is not broken for any of these generic attacks, yet. See: What makes SHA-256 secure? on crypto.stackexchange.

if we could take the algorithm and just reverse engineer it to generate a string like 8GN492MD that would also output dug84nd8

  • If we only consider this it is pre-image attack and the cost is O(2^256) for SHA-256. Note that the aim of the pre-image attack is not finding the original input, it is rather finding an input that has the same hash value. If there is no external test for the input, one cannot decide that the found pre-image is the pre-image. And, the actual search space for the pre-image can be much larger than the pre-image attack handles.

    There is a variant of pre-image attack, that occurs when the message space is short, like hashing the phone numbers. In this case, it may very easy to find the pre-image.

If we could take the algorithm and just reverse engineer it to generate a string like 8GN492MD that would also output dug84nd8, wouldn't it say that (s2("Hello") = s2("8GN492MD")) == true), and let a hacker in?

  • If we consider above in more simple terms; given Hello and dug84nd8=SHA256(Hello) and find another message with the same hash value as you requested then this is the Second Pre-image attack and the cost is O(2^256) for SHA256.

The Second Pre-image is what you are looking for. That is infeasible and not only SHA-256 but also none of the cryptographic hash functions are broken in this way.

  • The collision attack cost of SHA-256 is O(2^128) due to the birthday-attack with a 50% probability. In the collision attack, the attackers are free two choose the a and b. This is not your case since you start with a fixed message.

The conclusion, any of these attacks are not feasible for SHA-256 as of 2020.


The errors in the topmost answer

The range is 2^256 (possible output states) and the size of the input space is infinite (technically 2^(2^64) apparently, but much larger than 2^256). And that's precisely what permits the collisions (by the pigeon hole principle, there must be more than one possible input for each output - at least for one of the inputs).

SHA256 input space is limited due to the padding standard of NIST. One can hash at most 2^64 bits therefore there are at most 2^(2^64) different messages since the size of the message is encoded at the end of the padding in 64 bits.

The pigeonhole principle only says that there is at least one pigeonhole that has more than one pigeon. It doesn't talk about others that can be empty or not. With this, we can say that at least there must be one collision. Interestingly, we don't know that a SHA256 restricted to 64 bits attains all 64-bit values. What we expect from SHA256 is it is indistinguishable from uniformly random.

You can easily tell it's not an invertible function because the size of the domain (possible number of inputs) is greater than the size of the range (possible number of outputs).

That is not correct, too. Take modulo 2^256 as hash then it is non-invertible. One, however, given a hash value can calculate a pre-images easily, given x, a;; x+k 2^265 are pre-images. In other words, we inverted the map. The correct definition is a function which is computationally infeasible to invert

kelalaka
  • 5,474
  • 4
  • 24
  • 47
  • I remember finding a website where someone took the sha-256 of "Hello, world!" and made a string out of it – xyper Aug 12 '20 at 00:26
  • Also i dont choose the strings randomly, i reverse engineer all the not and xor gates and whatnot to make a feasible string. – xyper Aug 12 '20 at 01:09
  • 9
    @BOI doing that for modern hashing algorithms is literally impossible. Trying random strings until you find a match is the only option – Conor Mancone Aug 12 '20 at 02:12
  • 24
    @BOI: If you are indeed able to do that, two things will happen: 1) You will become very famous, because you managed to do something that the entire crypto-community has been trying and failing for 20 years. 2) Someone will analyze your attack and design a new hash function that fixes the flaw that enables your attack. The current best known attacks on SHA-256 are only on reduced-round variants. – Jörg W Mittag Aug 12 '20 at 06:12
  • 4
    @BOI I encourage you to try it and see what happens! – user253751 Aug 12 '20 at 14:09
  • SHA256 has gained lots of attention thanks to Bitcoin. – kelalaka Aug 12 '20 at 14:11
  • 5
    @BOI Essentially, even _if_ you managed to find a value that hashed to the same value as SHA256("Hello, world!"), this would only give access to accounts that used "Hello, world!" as their password - which if you knew which accounts those were, you could just use "Hello, world!" and avoid all the fussing about with hash collisions. And that also assumes the protected service hasn't taken other basic precautions like salting. And that they're using straight SHA256, not iterative, other algorithms, etc. – GalacticCowboy Aug 13 '20 at 02:30
  • 2
    @JörgWMittag "Someone will analyze your attack and design a new hash function that fixes the flaw" is an optimistic belief rather than a fact. It is not necessarily true that we can always do that. – freakish Aug 13 '20 at 08:10
10

and let a hacker in?

Sure. But the thing is that we actually don't know how to find another string that produces the same hash in an efficient way. At least for SHA-256 and other widely used hashing algorithms. Note that these algorithms are public, no reverse engineering is needed, which doesn't change anything. This is simply too hard, and in fact those algorithms are deliberately designed in such a way.

The whole problem boils down to solving f(x)=y equation for some function f and some y. One possibility is to scan all x, assuming the domain is enumerable. But that's inefficient and works only if we already know that a solution exists (which I'm not sure whether all values of SHA are attained multiple times). Other possibilities are often not known.

Perhaps this is an educational issue. In school we are often told to solve equations. Linear, polynomial, logarithmic, sine, etc. What they don't tell you is that they choose these equations in such a way that they are solvable and in a relatively easy way. But in fact even now the most briliant minds don't know how to solve majority of equations out there. And here you've stumbled upon one such (extremely important) example.

Note that the situation may (and already did for other hash functions) change in the future.

freakish
  • 211
  • 1
  • 4
  • 4
    We *do* know how to find another string which produces the same hash. At least, you have brute force. We just don't know how to do that in a feasible time/energy constraint. – Lawnmower Man Aug 12 '20 at 19:10
  • @LawnmowerMan I've updated the answer. However it is unclear to me whether brute forcing is guaranteed to stop. I.e. whether every output has more than 1 preimage. – freakish Aug 12 '20 at 20:46
  • 1
    Maybe not, but brute force can prove that, too. ;) – Lawnmower Man Aug 13 '20 at 05:46
  • 1
    @LawnmowerMan only under the assumption that the universe is finite. :) – freakish Aug 13 '20 at 15:09
  • @freakish Well, do we actually care for finding a _different_ input? We are looking for an input and just assume that we will find an alternative (colliding) input faster than the original input (also, supposedly we don't even have a way to verify it). Finding the original input is just as good, as far as the hacking goes. – Frax Aug 14 '20 at 15:27
  • 1
    @freakish the question whether it's guaranteed to stop is normally not a major one. If you set it to stop after e.g. $2^260$ tries, you have an almost 100% chance of finding at least one (second) preimage. But anyways, that will be after the lifetime of the universe, so it's of no use. – Paŭlo Ebermann Aug 15 '20 at 01:04
5

I believe @kelalaka's answer is the most accurate, but I wanted to add an example that may hopefully shed some light on the issue.

First off, you are completely accurate that you could follow back all the logic in the circuit and eventually get a collision. However, one of the characteristics of a good cryptographic hash function is that this exercise is substantially as difficult as just randomly guessing.

Consider the following circuit. M1-M3 are the bits of the message. Given a message 101 and a seed of 1, we get an output of 1.

Three chained XORs in a circuit.

Now, let's try to find a different message that collides with 101 by tracing back the circuit. From the output, we know that M3 could be 1 or 0. Let's pick 0; that means the other leg must be 1 (1 XOR 0 is 1). Now we come to M2. We'll also pick 0 again. Now we look at M1. We'll pick 1 for M1. But, uh-oh. The seed would now have to be 0. 100 only works as a message if the seed is 0.

Obviously, in this very simplistic example, we could have just trivially assigned M1 to be 0, and then our seed would have been 1 as we expected. But the point of this example is to highlight the feedback and chaining elements that make this simple "just retrace the circuit" approach much more complicated in a real cryptographic hashing algorithm. The "circuit" required to implement these algorithms is extremely complicated, because it consists of multiplication, exponentiation, modular arithmetic, etc. And the recursive nature of some of these calculations make tracing the circuit backwards a massive branching exercise. Again, it's not impossible; rather it's just as hard as randomly guessing.

Nick2253
  • 171
  • 4
  • why is having the seed 0 a problem? – xyper Aug 12 '20 at 14:19
  • @BOI Because the seed being `1` is part of the hashing algorithm as I described it. If you wanted the seed to be `0`, that would be fine, but it would be a different hash function, producing different hashes. – Nick2253 Aug 12 '20 at 15:36
4

"Trying randomly until you find any input that produces the correct hash." Yes, but it is still a brute force attack. This is the premise of a rainbow table attack. Pre-compute values so that you have one input for every possible output. Then instead of trying all possible inputs, you can only try a subset of inputs that produce unique hashes. It doesn't matter if you get the exact input that was the original password because the system can't tell the difference.

Here are the problems:

  1. You have to find the input for each hash. As other answers have said this is "quite slow" with modern hashing functions since it is just brute force. (Unless you have a quantum computer sitting around.) The main benefit of a rainbow table is that for functions where you CAN calculate this then you can calculate all of them once and reuse the results over and over without recalculating (trading the space of storing all those values for the time savings of not having to recalculate them).
  2. You don't get the actual password out. In many cases, people are attacking less secure systems not because they want to break into those systems but to take advantage of people who reuse passwords/logins on other, more interesting systems.
  3. If the attacker doesn't know the target hash and has to check against the target system, then it can be blocked by restricting the number of incorrect attempts. (The attacker knowing the target hash can be blocked by other basic security practices.)
  4. Salt. Adding another string that is unique to the user to the password before hashing it. Just for example, say we add the username to the end of the password before hashing it. Now we have to find a string that, when the username is added to the end, computes to some hash. Since the username is unique to each user, we can't precompute this in bulk, we are back to doing it one by one...very inefficient, aka takes impossibly long.
user3067860
  • 160
  • 4
  • I guess it was way easier to just hash a password like hash("hello123"); and then brute force email-adresses or usernames until you find somebody stupid enough to use "hello123" as a password – clockw0rk Aug 13 '20 at 09:15
  • Your first paragraph is the most readable description of rainbow tables I've seen. Thanks to you, I finally understand them. I've gone through Wikipedia and several long "explainer" articles, and none of them were as helpful as the two middle sentences from ¶1. – Michael Aug 13 '20 at 21:40
  • @Michael Glad I was able to help! – user3067860 Aug 13 '20 at 22:11
  • 3
    @Michael Although everything else is valid, this answer doesn't explain rainbow tables correctly. They do not store one input for every possible output (that would be more terabytes than stars in the universe, even for MD5.) Rainbow tables involve precomputing (let's say) a million hashes and storing (say) 1000 of them. Using a clever algorithm, those million can be reconstructed using a lot of work—but much *less* work than pure brute force. Rainbow tables are simply leverage to attack somewhat longer passwords when it's infeasible to store precomputed tables of that size. – Artelius Aug 14 '20 at 05:53
  • @Artelius Phooey. Thanks for clarifying. – Michael Aug 14 '20 at 13:26
3

You probably mix "reverse engineering" and finding an "inverse function". These are different concepts.

Reverse engineering is deducing the algorithm from its (closed) implementation. SHA-256 algorithm is public and you can skip the reverse-engineering part altogether. Just look in Wikipedia, or the source code of some open source crypto library.

In order to break the crypto (in your case, to find a "hashing collision"), you need to find an an "inverse function" - in the same math sense that the square root is inverse function of the square.

Hashing algorithms that are used for cryptography are specifically designed to resist finding collisions and inverse functions. If someone finds an easy way of finding collisions, the corresponding hash function is considered compromised and people stop using it. That's what happened to MD5 or SHA-1 functions.

There are other hashing functions (e.g. made for use in the database hash tables) that are NOT made to be that much resistant against collisions, but are computationaly cheaper and/or have other advantages. They still called hashes, but are used in their respective fields and not in cryptography.

fraxinus
  • 3,458
  • 6
  • 20
  • By an inverse function, i would try and find each function that sha-256 has, and for example a NOT gate, i would choose a random input, which would go back and back until there is a working string – xyper Aug 12 '20 at 11:41
  • @BOI good luck. If you succeed, I have a few ideas how to become very rich before you publish your findings. – fraxinus Aug 12 '20 at 12:47
  • 2
    @BOI there are a lot of gates in SHA-256. If you try random inputs for all of them, you will try a lot of random inputs before you find ones that work. Remember that several gates all have the same input, so an input that works for one gate might not work for a different gate. – user253751 Aug 12 '20 at 14:10
  • @BOI the reason why this doesn't work it's because you can't do it bit by bit, as in a cryptographic function generally every output bit is/can be influenced by *every* input bit. This reduces a "choose a random input and backtrack if it's wrong" to a search of all the possible keyspace, as the "random input" has to be the *whole* input and getting one right by random chance is 1/2^256 = effectively zero. – Peteris Aug 13 '20 at 14:12
2

Destroying information

When computing a hash function, information is destroyed at certain stages of the algorithm. This is the key to why you can't "run the algorithm in reverse."

If you start with the hash ccb92793f8a87a695fa3f2e805779da8, working backwards, there may be billions of possibilities of how the previous stage got you to that value. No problem—pick one and go through the next stage; same deal. After several stages you reach a point where you get stuck and can't go further; you have reached an impossible intermediate state. So you have to go back and make a different choice and your billions start multiplying. If there are enough stages, this becomes harder than brute forcing the inputs, so you might as well just do that instead.

Artelius
  • 588
  • 2
  • 4
  • Destroying information doesn't prevent the attacker from finding the pre-images faster than the brute force. For example take [MD5](https://security.stackexchange.com/a/225227/86735) – kelalaka Aug 15 '20 at 22:06
1

All the other answers are all correct and cover certain aspects, nevertheless I want to show another approach.

What you - more or less accidentially - found out is the inevitable connection between hash functions and the pigeonhole principle.

If you look at the definitions, it is quite obvious:

  • A hash function is a mapping from a domain containing data of arbitrary size onto data (a bit array) of fixed size.
  • The pigeonhole principle says: If there are more pigeons than holes, then there has to be at least one hole with at least two pigeons within.

For the input domain, you may have longer passwords and of varying length, and the symbol set is typically larger (small and capital letters, digits, some symbols). These are the pigeons.

The number of possible hash values is easy to calculate: If the length is n out of b symbols, then there are b^n possible hashes (you may set b=2 to count bits). These are the pigeon holes.

So, you have more pigeons (possible input data) than holes (possible hash values), and so there has to be at least one pigeon (possible input data) placed into (mapped onto) a hole (a hash value).

Of course, in this case the function is never injective, and hence not bijective, and hence not invertible.

rexkogitans
  • 201
  • 1
  • 6
  • "Not invertible" doesn't mean you can't find *an* input with a given output. – user253751 Aug 14 '20 at 12:16
  • @user253751 By "not invertible" I mean the mathematical, strictly formal sense, as in g is the [inverse function](https://en.wikipedia.org/wiki/Inverse_function) of f if and only if g(f(x))=x for all x of the input domain. Such a function g does not exist if f is not bijective. Of course, brute force is always possible. – rexkogitans Aug 14 '20 at 19:38
1

It's too much for a comment, so I'll add it as a answer:

A small trick that might help: Say you algorithm is $a * $b, if you have 3 and 4, you get 3 * 4 = 12. You now have the outcome which you can not reverse (was it 1&12, 2&6, 3&4, 4&3, 6&2 or 12&1?), but it has multiple collisions. In this case 6 different inputs would result in the same result, so we have 6 collisions.

One 'trick' to minimize that chance (which will never be zero if you have a finite of charaters of the resulting hash) is adding more bits. That means that the outcome will be eg 1862534 for 3&4 as input, and 2&6 could become 6793439.

Martijn
  • 359
  • 1
  • 2
  • 9
0

Let's say we have a separate hashing algorithm called s2 and it would convert Hello into dug84nd8.

.. a string like 8GN492MD that would also output dug84nd8 ...

What you describe is a "hash collision".

And yes: In this case both Hello and 8GN492MD would be accepted as valid passwords.

I feel like that I am missing something, but I don't know what it is.

First:

You didn't write that the attacker knows the hash value (dug84nd8). However, it seems to be obvious that you wanted to write it.

Second:

Hypothetically it would always be possible to find some string like 8GN492MD that has dug84nd8 as output if you had enough computation power (maybe a large quantum computer).

However, the functions that are used to calculate the string 8GN492MD from the string dug84nd8 are so-called "one-way functions".

These are functions that can be calculated quite easily on a "normal" computer; however, it is not known if it is possible at all to calculate the reverse function (finding the string 8GN492MD when the string dug84nd8 is known) within a realistic time (for example less than 10 years).

And of course it is also not known how this can be done if it should be possible.

Indeed, it sometimes happens that some mathematician finds a way how to find a collision. This means that the mathematician finds a way how to find the string Hello or 8GN492MD when the string dug84nd8 is given.

If this happens, you cannot use the hash function (the function calculating the value dug84nd8 from the value Hello) any longer and you have to replace the hash function by another one. Otherwise you'll have a security problem.

Martin Rosenau
  • 339
  • 1
  • 5
-1

Others mentioned the use of the terms "reverse engineering" and "attacking" etc., so I won't respond to this. But for your question:

Consider hash(a) = x;

Now you know x. Let's say you have access to immense computation power, like a botnet, or maybe a giant server farm. Now, in theory, you could start producing hashes, i.e. hash(b) = y; hash(c) = z and so forth. With enough time, you could probably find a function hash(something) = x. So, you would kind of compare the outputs against each other until you found a matching pair.

I personally don't know how much time this would take, but I think it would be possible. But, I guess there are better ways to steal a password, like a social engineering attempt, or breaking in on the client's machine rather than on the server.

schroeder
  • 125,553
  • 55
  • 289
  • 326
clockw0rk
  • 119
  • 7
-1

There are many great technical answers. I'll try to provide something more approchable: an example.

Imagine, my secret algorithm is this:

  • I take an input number,
  • add all the digits,
  • multiply the sum with each digit,
  • then repeat the process.

So 234 would come down to (2+3+4)+2*3*4 = 216. (2+1+6)*2*1*6 = 108. That 108 is the hash. With your understanding of my algorithm, can you calculate a number that ends up being 108? If you can't, you have to guess. That looks easy here, but imagine these numbers being thousands of digits long...

Disclaimer: I suck at math, you probably can actually crack that one. SHA is another beast.

-3

Most hashing algorithms are one-way because if you reverse the operation of the hashing algorithms it will also act as a hashing algorithm (maybe not a good one).

Say for example you have an algorithm called SHA256(). And you then implement a function with exactly the reverse operation of it called REVERSE_SHA256().

What will happen is that the output of REVERSE_SHA256(SHA256(INPUT)) will not match the INPUT:

INPUT = "some string"
HASH = SHA256(INPUT);
OUTPUT = REVERSE_SHA256(HASH);

OUTPUT != INPUT

Not only that, but for a cryptographically secure hashing function the OUTPUT cannot be rehashable to the same value:

HASH2 = SHA256(OUTPUT);

HASH2 != HASH

That such an algorithm can exist is why cryptographic hashing functions exist.

slebetman
  • 231
  • 2
  • 4
  • 1
    I don't think this answers the question, because OP explicitly asked about generating a string that results in the same hash, so `HASH2 != HASH` contradicts the assumption of the OP. OP also didn't mention simply reverting the sequence of operations of the hashing algorithm. – Raimund Krämer Aug 12 '20 at 09:01
  • @RaimundKrämer It answers the part where the OP wants to generate the string by simply reversing the operations of the hashing function. You and I understand that this is not possible - that the only way to generate such a string is by trying all possible strings until you find one with the same hash. The OP doesn't understand that there can be algorithms that are not reversible even when you reverse the operations of the algorithm. – slebetman Aug 12 '20 at 09:06
  • 3
    SHA256(REVERSE_SHA256("Some hash")) has to be "Some hash" by definition. You can't say that SHA256(REVERSE_SHA256(HASH)) != HASH because then your REVERSE_SHA256 function is ***broken*** (as in, not working). – user253751 Aug 12 '20 at 14:11
  • What would it possibly mean for a function to be "one-way"? Non-injective? Non-injective functions don't fix OP's question. – gspr Aug 13 '20 at 09:40
  • @user253751 That's the whole point of my answer. That constructing a reverse function as proposed by the OP will not work - IE the reverse function is necessarily broken. The OP assumes that there is a way to reverse the hash by reverse engineering the mechanics behind generating the hash. I'm attempting to show that for some algorithms - especially algorithms we consider cryptographic hash functions, such reverse engineering leads to nonsense results. – slebetman Aug 14 '20 at 06:10
  • @slebetman You first say that `REVERSE_SHA256(SHA256(INPUT)) != INPUT` which is fair enough. (You don't show it, actually, but you do *say* it). Then you say that `SHA256(REVERSE_SHA256(HASH)) != HASH` which is plainly incorrect. A string doesn't have more than one SHA256 hash. – user253751 Aug 14 '20 at 12:09
  • First one: "If I find a string whose hash is the same hash as hello, it might not be hello" - true. Second one: "If I find a string whose hash is abcdef012345, then its hash isn't abcdef012345." - false. – user253751 Aug 14 '20 at 18:25