0

I have bunch of cat images whose names are sha1 of website where it was posted.

Here is an example: 3afec3b4765f8f0a07b78f98c07b83f013567a0a.jpg

website: http://www.example.com/image.jpg

If sha1 is simple/dumb function i.e. hash = sha1('text with no random salt attached') and every time it produces the same hash result for the same given input Why can't I just get back original website name from hash? or am I something?

Josh
  • 3
  • 1
  • Thanks for all the answers, they really helped me understand how one way hash functions work under the hood. I wanted to upvote all of them but I can't. – Josh Sep 26 '21 at 03:43

3 Answers3

1

As mentioned, because SHA-1, like other cryptographic hash functions, is deterministic, you can create a mapping of inputs to outputs, and this is called a rainbow table. This is one of the many reasons why using a plain cryptographic hash function as a password hash is not secure.

However, cryptographic hash functions are specifically designed not to be invertible. SHA-1, like MD5 and SHA-2, are Merkle-Dåmgard hash functions, which basically work like this:

H_n = H_(n-1) + E(M, H_(n-1)) where M is the message block, H_0 is the initial value, H_n is the final hash, and E is the block cipher that's built into the hash function. That is, the message is used as a key in a special-purpose block cipher to encrypt the previous hash value, and the result is added to the previous hash value. This final addition makes it such that the function can't be inverted, since you can't know the output of the block cipher to invert it due to the addition, and because the block cipher is a pseudorandom permutation, it can't be guessed.

Other cryptographic hash functions, such as SHA-3, are based on different designs. The output you see from SHA-3 is only part of a very large (1600-bit) state, so in order to invert it, you'd need to be able to guess a large number of bits (usually 512 or more), which is computationally infeasible.

But, as the rainbow tables show, guessing values is a legitimate option. If the input values are from a set whose possible values are small or easy to guess, then you might want to use HMAC (e.g., HMAC-SHA-256) with a secret key to produce hashes, or, for passwords, a dedicated password hash like Argon2 with a strong salt.

bk2204
  • 8,695
  • 20
  • 19
0

Yes, what you are describing is possible - if you have a lookup table of SHA1 inputs and outputs, and the output you are given is in the table. This type of table is known as a rainbow table.

For example, suppose you have a rainbow table like so:

input                           sha1(input)
https://www.google.com/         595c3cce2409a55c13076f1bac5edee529fc2e58
https://www.ubuntu.com/         11aee28b64583ef73da48d985f6bc114c4f21963
https://www.stackoverflow.com/  9e45c1b93869a19734b63461600265254ccb0415 

and you are given 11aee28b64583ef73da48d985f6bc114c4f21963, and asked to find the input that produces this sha1 hash. You can easily find 11aee28b64583ef73da48d985f6bc114c4f21963 in the rainbow table, and see that

   sha1('https://www.ubuntu.com/')=11aee28b64583ef73da48d985f6bc114c4f21963

But, what if you are given a sha1 hash that is not in the rainbow table, like 8e626aa94c8f93cc161b06e0c4bac8f0373c34c8?

mti2935
  • 21,098
  • 2
  • 47
  • 66
  • You are right about random table but what I can't understand is if it produces same hash number for same input it must follow some algorithm like A=1, B=2 etc and why can't we decrypt the algorithm get original text back. If it randomly generating the output would be different every time. – Josh Sep 25 '21 at 19:58
  • I think you are confusing `deterministic` (produces the same hash number for same input) with `reversible` (decrypt the algorithm get original text back). Consider basic multiplication, e.g. A*B=C, where A and B are large prime numbers. This is deterministic. If given the same A and B, C will always be the same. And, it is easy to calculate C if you know A and B. But, it is very hard to find A and B if given only C. So, basic multiplication (where A and B are large prime numbers) is deterministic but not reversible. This is actually the basis of RSA encryption. – mti2935 Sep 26 '21 at 16:20
0

In addition to what the other answers mentions, the pigeonhole principle is important:

In mathematics, the pigeonhole principle states that if n items are put into m containers, with n > m, then at least one container must contain more than one item.

SHA-1 is 160 bits. Thus, for inputs greater than 160 bits, there has to be collisions, where multiple inputs hashes to the same value, as the number of valid inputs is larger than the number of possible hash sums produced. This is of course true for any hash system.

Thus, it's impossible to deterministically map from hash to data, of data length is larger than hash length.

The collission will of course probably not start with httsp://, so if you know something about the original input it's possible to be fairly certain that you've recovered what was originally hashed.

vidarlo
  • 14,890
  • 2
  • 43
  • 56
  • While you allude to it implicitly in your answer, it might be worth overtly stating that the hash function is **not reversible**. If it were, any piece of information could be reduced to 160 bits. This in turn would mean **all** pieces of information could be reduced to 160 bits and therefore the universe could be described in 160 bits. Clearly that's not right. – user10216038 Sep 26 '21 at 03:40