25

I may be misunderstanding, but if I want to use a hashing algorithm such as argon2, what's stopping from someone seeing how it works and reversing what it does?

Jan Trindal
  • 401
  • 4
  • 4
  • 99
    Kerckhoffs's principle: A cryptosystem should be secure even if everything about the system, except the key, is public knowledge. – multithr3at3d Mar 06 '20 at 01:25
  • 5
    Also, noted by some answer but not all, a hash is actually not technically "reversible" (meaning it's not a one to one function, i.e. for some of the steps, 2 different inputs might give the same output, so given the output, you can't be sure of what was before) – Pacopaco Mar 06 '20 at 05:53
  • 3
    You know how password checks work. You give a password and the system checks if it's the correct one. Would you be able to guess the password by "reversing what it does" ? – ChatterOne Mar 06 '20 at 09:26
  • 2
    Nothing stops them.But you have to consider that thousands of the best people on Earth try that for decades and fail... therefore we assume that tomorrow nobody will be able to too. Moreover most modern cryptography relies on well-known mathematical problems. So if somebody came along and solved them they would also indirectly solve those problems. A lot of these problems are unsolved since the 50s or before and so that would be a huge discovery with a lot of consequences, not limited to reversing that specific hash/encryption algorithm. – Giacomo Alzetta Mar 06 '20 at 10:45
  • 87
    The algorithm for turning a cow into ground beef is publicly available. What’s to stop someone from reversing what it does and turning ground beef into a cow? – Mike Scott Mar 06 '20 at 18:02
  • 18
    Start with a simpler question: what is stopping *you* from doing that? – Eric Lippert Mar 06 '20 at 21:33
  • 1
    The design of a gun-type nuclear weapon is common knowledge. What's stopping people from making backyard nukes is a completely different constraint: the availability of U-235. Just because you can say "do X" doesn't mean that doing X is actually feasible. – chrylis -cautiouslyoptimistic- Mar 07 '20 at 00:42
  • 1
    here's my hash function: add up all the bytes. i have a message whose hash is 2437. what's the message? – Eevee Mar 08 '20 at 03:12
  • If I may speculate a bit, OP, I suspect that you ask this question because, intuitively, hashing a password seems like it should be a simpler and less destructive operation than, say, turning a cow into ground beef. Still too complex for the common layman (such as yourself), but something that someone with a Ph. D and sufficient willpower could figure out. – TheHans255 Mar 08 '20 at 07:27
  • Does this answer your question? [Why are hash functions one way? If I know the algorithm, why can't I calculate the input from it?](https://security.stackexchange.com/questions/11717/why-are-hash-functions-one-way-if-i-know-the-algorithm-why-cant-i-calculate-t) – Mike Scott Mar 08 '20 at 09:59
  • 1
    A simple question: I have a random number, and I divide it by 7. The remainder is 2. What is my original number? You know everything I did, so try to reverse it. –  Mar 08 '20 at 15:24
  • @TheHansinator I don't think you need a Ph.D to turn a cow into ground beef. – Josef Mar 09 '20 at 10:55
  • @JosefsaysReinstateMonica Sorry, I wrote that wrong - I meant to say how easy each algorithm was to reverse. To the uninitiated, it seems like reversing a password hashing algorithm is something you could do with a Ph. D and lots of patience, because they don't know that it would be like turning ground beef back into a cow. – TheHans255 Mar 09 '20 at 14:30

13 Answers13

54

Being public is exact the point: you show everyone how it's done and how difficult it is to reverse it. It's like showing you a ginormous jigsaw puzzle with a trillion pieces, but with every piece in its place, and shuffling everything down. You know all the pieces form the puzzle (you just saw it), and you know it's very, very difficult to put everything back. A public hash shows you how it's done (the result) and how difficult is to do everything in reverse.

A public hash function is just a set of mathematical operations. Anyone can (but only a few will) do the operations by hand and prove that the algorithm works as expected. Anyone can reverse it too, but it takes so much time (trillions of years with all computing power of our planet combined) that the most cost-effective way to reverse it is a bruteforce.

Unless it's a pretty basic insecure hash function.

ThoriumBR
  • 51,983
  • 13
  • 131
  • 149
  • 56
    "Anyone can reverse it too, but it takes so much time" - a hashing function *cannot* be reversed, no matter how much time you spend at it (assuming it is one in which the range of input values is larger than the range of output values). The best you can hope for is to find an input that produces the same output, but you can never identify the original input. – Jon Bentley Mar 06 '20 at 15:02
  • 1
    @JonBentley: if there are no collisions, your "input that creates the same output" is the initial input. – WoJ Mar 06 '20 at 15:35
  • 31
    @WoJ the pigeonhole principle: all hash functions have collisions – Jacob Krall Mar 06 '20 at 15:38
  • 1
    @JacobKrall: you are right of course, I was not clear. Good hash functions will minimize the risk of a collision, so when you find an input which is hashed to a given output, then the input can be assumed to be the same as the original one. But yes, this is indeed not a certainty. I stand corrected. – WoJ Mar 06 '20 at 15:46
  • 9
    @WoJ Considering that a typical commonly used hash algorithm will accept an arbitrarily sized input, there are an infinite number of inputs for a given output. So it cannot possibly be the case that the input "can be assumed to be the same as the original one" or even that it is *likely* that they are the same. Minimising the risk of collisions doesn't mean that the *quantity* of possible collisions is low, but rather than the probability of *finding* one is low. – Jon Bentley Mar 06 '20 at 17:49
  • 3
    @WoJ Unless you know the size of the original input and that matches as well, you actually can't safely assume at all that an input producing the correct output is the original input. And even then, if the input size is significantly (multiple orders of magnitude) larger than the hash output size, you still can't make that assumption unless you know more about the original input (such as what the expected internal structure for that file type is). – Austin Hemmelgarn Mar 06 '20 at 18:07
  • Although when you have additional knowledge like "The input was a well formed string of UTF bytes between 8 and 24 characters with at least one upper one number and one special character salted with X and peppered with Y" you can be pretty confident a match you find with those characteristics is the original ;) – Affe Mar 06 '20 at 20:02
  • 1
    *Anyone can (but only a few will) do the operations by hand and prove that the algorithm works as expected.* -- I am pretty sure that none of the commonly used hash functions are actually mathematically proven to be collision resistant. – trognanders Mar 06 '20 at 20:24
  • @Affe Around two years ago or so someone got the first (and only?) known instance of collision of SHA1 by trying to make a git commit. So somewhere out there there is a project which has source code at two different points in time that are different that generates the same hash value – slebetman Mar 08 '20 at 02:05
  • 4
    @WoJ - I think what Jon means is that as it stands this answer is a bit confusing because it says: `Anyone can reverse it too, but it takes so much time .. that the most cost-effective way to reverse it is a bruteforce`. Implying that hashes have a non-brute-force way to reverse but brute force is more effective. This answer perhaps does not understand that brute force and reversing a hash is exactly the same thing. This makes the last sentence in the answer silly: `you can try to reverse the hash by brute-force but it takes so long that you should brute-force it instead` – slebetman Mar 08 '20 at 02:08
  • ... If there is a hashing algorithm that can be reversed without brute-forcing then it is not a hash function, it is a map (or in mathematics we say it is a *function*) – slebetman Mar 08 '20 at 02:10
  • @JonBentley while hash functions usually have a larger input than output size, the output size (at least for cryptographic ones) is usually still large enough that you can't try enough inputs to get a collision. The point is that it should be impossible to even find a single pre-image (no matter wether it's the original one or a different one) or collision. – Paŭlo Ebermann Mar 09 '20 at 00:29
  • Saying that a hash can't be reversed, even by brute-force, seems a little over-simplified. I mean, in the general case, a hash could be reversed to generate a family of messages that, when hashed, would yield the same hash. Then if this were practical, an attacker might be able to reasonably guess particular messages. – Nat Mar 10 '20 at 05:35
54

Probably not the answer you're looking for, but consider this.

Take a 10-digit number, something like 3,481,031,813, and then now with only pen and paper find it's square (i.e. multiply it by itself). While tedious, this is relatively straightforward and can be accomplished after some time.

Now with the same pen and paper, try calculating the square root of a 20-digit number. This is a much much harder task -- even though it's effectively the reverse of the first task.

Mathematical functions can be made, so that inverse function is much harder to solve. One way hashes take this to their logical conclusion -- the function is so hard to solve as to be rendered practically unsolvable.

Add to that, the fact that information is lost along the way. The square of 2 is 4, but the square root of 4 is both +2 and -2. Information was lost during the square function, as to what the sign of the original number was. Hash functions effectively do this as well, information is lost when you take a 10GB file and shrink it down to a 256-bit hash, there is simply no way to reconstruct the original message anymore.

keithRozario
  • 3,631
  • 2
  • 12
  • 25
  • 22
    -1 While I understand the point you're trying to make it doesn't really work. Taking square roots over the real numbers (or over Z where you just point out that for most numbers there are no square roots) is easy. Even with pen and paper a 20 digit number is doable in reasonable time. You really want to use something like factoring as your example which is actually slow. Also you are confusing two things which is the fact that a hash function is not injective and that it's hard to find a preimage. These are not necessarily related. – DRF Mar 06 '20 at 10:14
  • 9
    @DRF - Agreed. Keith, I think you'll need to change your example. The difference between hashing and 'theoretically reversing a hash' is 'Trivial' and 'Basically Impossible'. The difference in your example is 'Meh' and 'Ugh, fine.' – Kevin Mar 06 '20 at 12:19
  • 6
    @DRF you're right on both accounts, but still I think this example is not bad: the square function is indeed both _easier to compute than finding a preimage_ (though that too isn't really _difficult_), and non-injective (albeit in a very harmless way). So it has the essential characteristics of a hash function, just both of them way too weak to be actually usable. – leftaroundabout Mar 06 '20 at 13:15
  • 42
    His example is a very simple way to illustrate that something can be harder to revert than to compute. That it doesn't scale well to the difference between hashing and un-hashing doesn't matter to what he's trying to say. – Echox Mar 06 '20 at 13:28
  • @leftaroundabout Well the example doesn't fail utterly but it has some very specific problems. I might actually have a bigger issue with implying non-injective implies hard to find preimage - I realize that's not what the poster says, but it looks a little bit like it - then with the fact that square root is easy to invert. You can have functions where it's harder to compute the function than find the pre-image, for example absolute value. – DRF Mar 06 '20 at 14:47
  • But I think the most important thing is this would be the best answer in terms of explaining the necessary concepts if just a few things were changed to not be misleading. (also you have no idea how many times I've had to explain to someone that being unable to invert a function is not the same as being unable to find a pre-image) – DRF Mar 06 '20 at 14:48
  • 1
    There also exists many methods to expedite square roots by hand (initial estimate, Babylonian method) https://en.wikipedia.org/wiki/Methods_of_computing_square_roots – nabulator Mar 06 '20 at 14:59
  • 31
    @DRF I think you've overanalysed the point of what this answer was trying to achieve with the square root example. It was merely meant to show, in a simple way that someone without advanced maths knowledge can grasp, that you can have a function which is more difficult to solve in one direction than in the other. The *degree* of the difference in difficulty is not what was being demonstrated, nor any other concepts related to hashing. – Jon Bentley Mar 06 '20 at 15:10
  • 1
    A hash function is impossible to compute in reverse - not merely difficult. This is a reasonable analogy for asymmetric encryption, but not for hash functions. – Tim Mar 07 '20 at 10:08
  • 1
    I know a method of calculating the square root of a number on paper that has the same complexity as long division. – CJ Dennis Mar 07 '20 at 23:41
  • Standard pencil and paper multiplication of an n-bit integer takes n^2 time, while square root takes n^2 log(n) time using the "guess and check" method taught in elementary school. – bjb568 Mar 08 '20 at 01:17
9

I don't think I will be able to give an answer that fully satisfies you, but the short answer is that for something to be called a "cryptographic hash function" it has to be a complex enough function that this kind of reverse engineering is not easy. That's not to say that it is impossible, but as soon as someone makes even a little bit of progress reverse-engineering a cryptographic hash function, we will consider it to be broken, and move to something stronger. You can read more about the properties of cryptographic hash functions here (wikipedia).

As an example let's look at SHA-1, the properties of a cryptographic hash function are:

  • Pre-image resistance
  • Second pre-image resistance
  • Collision resistance

In 2005 an attack was invented that can find collisions in about 260 operations. That's still millions of $ USD to perform that attack, and as far as I know there are still no attacks on the other two cryptographic properties (pre-image and second pre-image), but that is enough for us to consider SHA-1 completely broken.

Mike Ounsworth
  • 58,107
  • 21
  • 154
  • 209
8

Because you can't reverse them.

Basically, if it's so easy, why don't you do it? Well, there used to be simpler hash functions, and people figured out how to reverse them, and then other people made it so those ways didn't work. By now, we have hash functions that nobody knows how to reverse.

It might be enlightening to actually try and reverse something like MD4 and see where you get stuck. Then find out how MD4 was reversed. For this last part, you'll need to find and read academic papers - this is easier if you're a university student and your university pays to give you access to the places where papers are uploaded, but often, you can find them on the Internet elsewhere.

user253751
  • 4,610
  • 3
  • 21
  • 17
  • 6
    Let's make an example out of this irreversability: Let the hash function be to split the number in two pieces and add them together. Simple stuff. Start with for example 8734, split it into 87 and 34, add them together to 121. Then you come across the hash 675. What was the original number? You can come up with a lot of candidates, but you can't be sure which one it was. A cryptographic hash algorithm makes even coming up with the candidates ridiculously difficult. – Suppen Mar 06 '20 at 13:54
  • 3
    @Suppen That's not the kind of irreversibility that usually matters with hash functions. 5566 also hashes to 121. So after I get the hash somehow, I can also log in with the passcode 5566. It doesn't matter that it's not the intended passcode. – user253751 Mar 06 '20 at 14:00
4

To make the analogy - knowing the "how hashing function works" is just like "knowing a recipe for pancakes". It is simple: you take flour, water, egg, pinch of salt and sugar and mix them together, then put on pan with hot oil, and afterwards fill it with jam or whatever you like.

Simple, fast, and easy to know how to do, and knowledge of it (just like knowledge of how hashing function works) is public (so there is even no need to reverse engineer it)

Now, you want to "reverse the hash". Apply the same pancake analogy - you have a nice finished delicious hot jam pancake, and want to extract the "original raw unscrambled egg" from it.

Good luck with that - no amount of "reverse engineering process of making pancake" will help you in accomplishing that.

The same way mathematics used in cryptographic hash functions works - it is extremely simple to do in one way, but no way to do it backwards.

Matija Nalis
  • 2,265
  • 13
  • 19
  • 1
    Or just try extracting the recipe by eating a cake. That's at least theoretically possible and you can bruteforce it to something close if you have enough ingredients. – Džuris Mar 06 '20 at 23:52
4

TL;DR; Cryptographic Hash functions are designed to be one-way regardless they are openly designed or not.


First of all, Argon2 is a password hashing algorithm for those we want from them

  • being slow: so that password searching will require more cost
  • large memory requirements: so that parallel search with ASIC/FPGA/GPU is prohibited
  • and data independence in order to prevent the side-channel attacks (Argon2i in this case).

For password hashing, the collision resistance is not required, the pre-image resistances are required.

For Cryptographic hash functions like SHA2, SHA3, Blake series, the first requirement is the collision resistance. Once you have a collision resistance, you can have the pre-image resistances ( proven second implies first is tricky that requires large input ).

what's stopping from someone seeing how it works and reversing what it does?

In modern cryptography, we work with the Kerckhoffs's principles. In short, except for the key, everything is public. Not all Cryptographic hash functions are keyless, there are keyed hash functions like HMAC and NMAC.

Hash functions are designed to work with arbitrary size length and fixed output size. This implies;

  • By the Pigeon-Hole Principle, the collisions are inevitable.
  • If one somehow finds a pre-image of a hash function they cannot decide that it is the pre-image or not without additional information.

Therefore, being invertible, though we don't want that and there is no such attack on well-designed hash functions, may not really helpful.

Why cannot be reversible;

The exact answer depends on the design of the hash function. For example, lets look at SHA256 series. They use compression function and that is designed by a highly iterated block cipher where the message is the key. The compression function that takes the previous 256-bit as plaintext and current 512-bit message as a key and produces 256-bit output. In the internal, the block cipher's round function uses AND operation. AND operation loses information and that prevents the reversibility. So, even you have only hashed 256-bit message (that requires padding) you cannot return back since the compression function is not reversible.

This doesn't mean that one cannot attack on cryptographic hash functions. MD5 has a collision attack, SHA-1 has a collision attack and recently this turned into a message forgery (a list of the attacks on SHA-1).

kelalaka
  • 5,474
  • 4
  • 24
  • 47
1

Let's have a look at the problem from a mathematical way. For the sake of the argument lets assume that a hashing function is just any function, say f(x) that maps some input set X to some output set Y.

So you are asking: if I know f and I know y why can't I simply find x such that f(x)=y? The beauty (and sole reason it makes sense) of cryptography is that there are functions designed in such a way that solving f(x)=y is insanely hard, even when you know exactly what f and y are.

This is just maths, some equations are hard. In fact, for some functions (like SHA family) no efficient method of solving these equations is known. This is also known as the preimage resistance, one of the fundamental characteristics of cryptographically secure functions.

freakish
  • 211
  • 1
  • 4
1

There are mathematical operations that are easily reversed. For example “add 312,579” is easily reversed by doing “subtract 312,579”. If Argon only used easily reversible operations, you might be able to reverse it. It doesn’t.

A quite simple operation that cannot be reversed is calculating x^3 modulo p, where p is some large number. If I give you a number y, and tell you that y = x^3 modulo p, then there is no known way for you to find x in a reasonable time unless I give you some additional information about p. (That’s roughly the basis for RSA).

For hashing, which hashes arbitrary large amounts of data to fixed size data, there is also the problem that many different input values will produce the same hashed output. So hashing functions cannot be reversible. (However, for hashed passwords this wouldn’t stop a hacker because if they find the “wrong” passwywith the right hash, that “wrong” password would also work. Your chances of finding such a “wrong” password are zero).

gnasher729
  • 2,107
  • 11
  • 16
  • That's not why RSA is not reversible without the private key. *I think* inverse of cubing mod a prime is possible and inexpensive, but I'm not sure. – Future Security Mar 10 '20 at 05:08
1

An example of a hashing function which isn't a particularly good or bad hashing function, but can be seen to be very hard to reverse without deep mathematics:

Take a 64 bit integer x. To calculate the hash h(x), calculate sin(x) with 100 digits precision, then take digits 81 to 100 of sin(x) as the hash code. Ok, it's not particularly easy to calculate, it takes a bit of time, but it isn't particularly difficult either.

Now if I give you digits 81 to 100 of sin (x), how would you go about finding x? The first 20 digits would give you some good information. But you don't have any of the first 80 digits. You know x is an integer, which makes the problem solvable in theory, but there seems to be no better algorithm than calculating sin x for x = 1, 2, 3 etc. until you find the right one. Worst case you have to check sin (x) for 2^64 values x.

gnasher729
  • 2,107
  • 11
  • 16
0

Do you already accept that (good) encryption algorithms can be made public as long as the key is kept private? If so, consider this analogy:

When you encrypt something, you can't reverse the encryption without the key, except by trying zillions of keys by brute force.

For the purposes of this analogy, hashing is similar, except the original message is also used as the key. So if you don't have the message, you don't have the key. You can't reverse* the hash without the key. Except by trying zillions of keys by brute force.

(You can technically create hash-like algorithms from encryption algorithms in this way, but you shouldn't. They may not have the necessary properties of cryptographic hash functions.)

* The correct word here is verify but it breaks the analogy.

Artelius
  • 588
  • 2
  • 4
0

Hashing cannot be reversed because the process of hashing something loses most of the information. For every single result of a hash algorithm there are an infinite number of different inputs that will give the same result.

Consider one of the simplest hashing algorithms, the simple checksum. Imagine you have selected one page out of a random book. For each letter on the page, convert the letter to a number, A=1, B=2, etc, and add the numbers up. For the most simple checksum, that's it.

If a friend does this and gives you the result of 28543, how are you going to figure out what book and what page they were looking at? Now a checksum normally isn't actually considered a hashing function because it's just too simple. It's extremely easy to find or make up inputs that give the same checksum, which is called finding a collision. Here's one way: take the checksum of 28543, divide by 26, to give 1097 Zs, with 21 left over, which is a U. Checksums are also easy to manipulate. Suppose you found a page in your own book which added up to 28540, well you can just add a C to the end to get the same checksum.

Cryptographic hashes have to be carefully designed to make it very hard to find collisions. They ensure that similar inputs give completely different outputs. Ideally changing just one bit of the input will cause half of the output bits to flip. But even so, with enough computing power, collisions can still be found. And collisions can be useful. If a computer hashes passwords, then if you can find a collision then you can log in with the one you found even though it might be different from the originally hashed password. If two passwords have the same hash then the computer can't tell them apart. But even if you can find a collision, this is not the same as reversing the hashing algorithm. Finding a collision won't tell you which of the infinite inputs was originally hashed.

curiousdannii
  • 361
  • 3
  • 12
0

Knowing how something is done doesn't always mean you can undo it.

Sure, if I tell you that the only way into my house is by guessing a three-digit code, then you have enough information to give it a good go by brute force.

But if you instead find out that you have to know some secret phrase that only I know, that's not of any benefit to you. It doesn't help you to figure out the phrase. (I suppose it gives you enough information to kidnap and torture me, but let's not get carried away.)

A fundamental principle of all decent cryptographic algorithms is precisely that knowing how it's done shouldn't let you undo it. There should be some other information required to decrypt the information, like a secret key. Otherwise the algorithm would be pretty useless, particularly as a hypothetical undocumented algorithm whose secrecy is the cryptography cannot be shared, and thus cannot be used to encrypt communications between two or more entities.

Finally, it's important to understand that hashing is not encryption. None of the above applies to hashing because hashing is one-way by its very nature. It's lossy. It's not there to make something secret: it's there to make a portable digest that shows you whether some information has been corrupted (or manipulated) without having to examine the entire payload. It's a verifier, not a secretiser.

-1

Just to add up to the question:

Your hashed values need to be of certain 'difficulty' because if a malicious user knows your algorithm, he can create some 'rainbow' tables where the hashes in this table were previously made by an attacker and check if "your hash" corresponds to a hash in the 'made' table.... thus knowing what was the original value hashed by you.

You can easily find reversed hashes online for words like: "Hello World","Password123",etc...

schroeder
  • 125,553
  • 55
  • 289
  • 326
  • The term is "rainbow tables". And "difficulty" is not the important thing. Making the input string unique is what you are looking to say (that's why salts are used). The most "difficult" algorithm is as easy to make into a rainbow table as the "easiest" algorithm.\ – schroeder Mar 10 '20 at 09:47