8

I have moved this question from stackoverflow to this place. I know it may be a question about 'opinion' but I am not looking for a private opinion but a source of the final decision to keep it this way.

I have been taught that nobody tries to open a door when one does not know that the door even exist. The best defense would be then to hide a door. It could be easily seen in the old war movies - nobody would keep a hideout in the light. It was always covered with something suggesting that 'there is nothing interesting there.'

I would assume that in cryptography that would work the same way. Why would then hash generated by MD5 started from $1$, and telling what this is a hash in the first place, and then what kind of hash it is (MD5)?

Now, I see that sha512 does exactly the same thing. Isn't it a weakness by itself? Is there any particular reason why we would have it done this way?

The main question the is: Should I scramble my hash before storing it to hide this from a potential enemy? If there is no need for that then why?

To avoid answers that suggest that obscurity is not security, I would propose this picture. It is WWII. You have just received a hint that SS is coming to your house suspecting that you are hiding partisans, and this is true. They have no time to escape. You have two choices where you could hide them - in the best in the world safe, or in the hidden hole underneath the floor, hidden so well that even your parents would did not suspect that it is there. What is your proposal? Would you convince yourself that the best safe is the best choice?

If I know there is a treasure hidden on an island then I would like to know which island it is or I will not start searching.

I am still not convinced. Chris Jester-Young so far gave me something to think about when suggesting that there can be more algorithms generating the same hash from different data.

Grzegorz
  • 199
  • 1
  • 1
  • 4
  • 3
    read the top answer to the linked question. Thomas is dead-on. Secrets are useful, but you want to keep the *right* secret. If you want to prevent attackers from brute-forcing your hash, then make it dependent on a secret key somewhere; *a secret you can keep*. Don't try to make your algorithm secret; it won't work. If you want to add secrecy, then add secrecy. Don't add obscurity. – tylerl Apr 17 '14 at 01:41
  • 2
    A better comparison to the situation with correctly-implemented modern cryptography: The SS knows for *certain* that you and the partisans are inside your home, and your safe has teragrams of food, a teleporter connected to America, and cannot be scratched by all the nuclear bombs in the world. You're darned right you should sit in the safe and stick your tongue out at the attackers. – Matt Nordhoff Apr 17 '14 at 03:53
  • “suggesting that there can be more algorithms generating the same hash from different data” No, this does not happen. Not if we're talking about cryptographic hashes (hashes used in hash tables are a completely different matter). – Gilles 'SO- stop being evil' Apr 17 '14 at 11:13

3 Answers3

29

First, there's Kerckhoffs's principle which is always desirable:

A cryptosystem should be secure even if everything about the system, except the key, is public knowledge.

where in this case the password is the key. So its not a goal to keep the cryptosystem secret.

Second, you are wrong about those being md5 or sha512 hashes; the values stored in your /etc/shadow are md5crypt or sha512crypt, which involves a strengthening procedure (many rounds of a md5 or sha512 hash).

Now if your four choices are MD5crypt, sha256crypt, sha512crypt, and bcrypt (the most popular choices in linux systems), here are four hashes all generated with $saltsalt$ (or equivalent) as a salt and hashing the password not my real password:

>>> import crypt
>>> crypt.crypt('not my real password','$1$saltsalt')
'$1$saltsalt$4iXfpnrgHRXkrDbPymCE4/'

>>> crypt.crypt('not my real password','$5$saltsalt')
'$5$saltsalt$E0bMpsLR71z8LIvd6p2tD4LZ984JxyD7B9lPLhq4vY7'

>>> crypt.crypt('not my real password','$6$saltsalt')
'$6$saltsalt$KnqiStSM0GULvZdkTBbiPUhoHemQ7Q06YnvuJ0PWWZbjzx3m0RCc/hCfq54Ro3fOwaJdEAliX9igT9DD2oN1u/'

>>> import bcrypt
>>> bcrypt.hashpw('not my real password', "$2a$12$saltsaltsaltsaltsalt..")
'$2a$12$saltsaltsaltsaltsalt..FW/kWpMA84AQoIE.Qg1Tk5.FKGpxBNC'

Even without the annotation, its fairly straightforward to figure out which scheme they each use (md5crypt, sha256crypt, sha512crypt, and bcrypt are 34,55,98, and 60 chars long respectively (in base64 encoding with annotation and salt). So unless you suggest truncating the hash, or altering the hashes properties the annotation for consistency doesn't lose any security. It also gives you a method to gracefully update user passwords. If you decide that md5crypt is no longer secure, you can switch users' hashes to bcrypt on next login (and then after a period of time deactivate all accounts left on md5crypt). Or if your algorithm like bcrypt (when it was $2$) needs to be updated, because of a flaw in design you can readily identify flawed schemes when the fixed scheme went to $2a$.

Even worse, you could try saying, I'm going to modify sha512 with new constants and round keys. That would make it superhard to break -- right? No, it just makes it super hard for you to know you didn't accidentally introduce a major vulnerability. If they can get at your /etc/shadow, they probably can also get at the library used to log you in and with time could reverse engineer your hashing scheme and this will be MUCH MUCH simpler than breaking a strong password.

Again, the expected time to brute force a very strong passphrase stored in sha256 hash is O(2^256 ), e.g., a billion computers doing a billion sha256crypts per nanosecond (each involving ~5000 rounds of sha256), would take 300000000000000000000000 (3 x 10^23) times the the age of the universe to break it. And with sha512crypt, if each of the ~10^80 atoms in the observable universe each did a billion sha512crypts every nanosecond it would still take 10^38 times the age of the universe. (This assumes you have a 256-bit and 512-bit or higher entropy passphrase).

dr jimbob
  • 38,936
  • 8
  • 92
  • 162
  • I am found convinced - especially by 3 x 10^23. Of course I would not use this approach in case of the WWII case ;) Thank you very much for your effort and details. – Grzegorz Apr 22 '14 at 20:58
4

Obscuring which hash is used makes it impossible for the system to authenticate the password for a legitimate user.

When password authentication via hashing in Unix was first invented, the password hash function was hard coded to use DES (now badly out of date). If the password hash is derived by any other function, there must be an identifier to allow the system to recognize what algorithm was used to generate the hash.

This is because password hashing is a one-way function. I've heard it said that expecting to run such a one-way function in reverse is like expecting to run a sausage factory backwards and have pigs come out the other end. So when you go to authenticate someone logging in to the system, you can only run the hashing function forward and compare the results with what is stored in /etc/shadow.

So those identifiers have to remain in the clear, otherwise the wrong hashing function will be used, the hashes won't match, and no one can get into the system.

Mike McManus
  • 1,415
  • 10
  • 17
  • Dear Mike. That is the most reasonable answer I have heard on this subject so far. I am upping you for the time being. Thank you. – Grzegorz Apr 16 '14 at 23:21
1

Attempting to obscure the hash implementation is really just moving the problem, from

  • I can see by protocol the hash function is X()

to

  • Looking at the code**, I can tell the hash function is X()

** (or other side channel of information)

If your attacker has any way to probe the system they will almost certainly be able to figure out the algorithm you're using. The easiest is to simply look at the code (source code or machine code), more complicated methods involve using timing measurements to differentiate the function being used. To use your "hiding people" analogy -- all it takes is me walking up with some sounding equipment & I can tell before I even enter the building there are underground rooms. 

I know you didn't want to hear it, but the reality is "Security through obscurity" inevitably fails due to an inability to obscure sufficiently, from every possible approach, both approaches known today, or some future approach.

Shawn C
  • 181
  • 2
  • Any source of analysis? How about attacker got only my `/etc/shadow` file? If he/she would not know the source of the hash he/she would have to use many algorithms, and guess the header/salt. Wouldn't that make it more stronger? - you would have to bring more equipment with you? And what if you don't have it? – Grzegorz Apr 16 '14 at 22:46
  • 1
    I'm sure we could go in circles for years, until you found something that with current technology would seem really obscure. The core issue is that at some point the algorithm is likely to be found, at which point if you are relying on the secrecy of your algorithm, your secret is now out. Every secret "hidden" by the algorithm is now out. Chaining algorithms together can sometimes increase security, but often times all it does is propogate weaknesses throughout the system. And finally, if I've compromised your physical security, there a much easier methods. – Shawn C Apr 16 '14 at 23:59