66

In using Argon2 for hashing passwords in my application, I've noticed it generates a string like this (e.g. for password "rabbit"):

$argon2i$v=19$m=65536,t=3,p=1$YOtX2//7NoD/owm8RZ8llw==$fPn4sPgkFAuBJo3M3UzcGss3dJysxLJdPdvojRF20ZE=

My understanding is that everything prior to p= are parameters, the body of the p= is the salt, and the last part is the hashed password.

Is it acceptable to store this in one field in a SQL database (varchar(99) in this case), or should the string be separated into its constituent parts? Should I store the salt separately from the hashed password and keep the parameters in code?

Luis Casillas
  • 10,361
  • 2
  • 28
  • 42
PenumbraBrah
  • 771
  • 1
  • 5
  • 6
  • 2
    [very strongly related](https://security.stackexchange.com/questions/17421/how-to-store-salt) – Sefa Apr 06 '18 at 14:31
  • 7
    If you use a [_pepper_](https://security.stackexchange.com/questions/3272/password-hashing-add-salt-pepper-or-is-salt-enough), you can store it outside of the database. – forest Apr 07 '18 at 02:14
  • 2
    PHP [password_hash](http://php.net/manual/en/function.password-hash.php) function stores the salt with the hashed string too (e.g:`$2y$10$.vGA1O9wmRjrwAVXD98HNOgsNpDczlqm3Jq7KnEd1rVAGv3Fykk1a`) – AccountantM Apr 07 '18 at 09:50
  • 4
    "varchar(99)" seems awfully tight. What if the next version of the hash library adds parameters or makes the hash longer? Doesn't your database have "varchar(MAX)" or similar? – Matti Virkkunen Apr 09 '18 at 10:29

7 Answers7

122

You should store it in a single field. Do not try to divide it into parts.

I know this might seem a bit unintuitive for people coming from a database background, where the modus operandi is to normalize data and using serialized strings is considered an ugly hack. But from a security practice, this makes perfect sense.

Why? Because the smallest unit the individual developer needs to operate on is that full string. That is what the app needs to pass to the hash library to check if a provided password is correct or not. From the developers perspective, the hash library can be a black box, and she need not concern herself with what those pesky parts actually mean.

This is a good thing, because most developers are imperfect humans (I know, because I am one myself). If you let them pick that string apart and then try to fit it all together again they will probably mess things up, like give all users the same salt or no salt at all.

Only storing the parameters in the code is also a bad idea. As processors gets faster, you may want to increase the cost factor in the future. During the migration phase, different passwords will have different cost factors. So you will need information on the password level about what cost factor was used to hash it.

Anders
  • 65,052
  • 24
  • 180
  • 218
  • 10
    Many cryptographic libraries take the secret and the salt as separate parameters. From that standpoint alone, it seems prudent to store the digest and the salt/entropy in separate fields. If you store them separately, you don't even have to use a uniform length for the salt. You can version your algorithms and support more than one at a time, allowing you to smoothly push users to a newer, stronger algorithm without even requiring a password change, and so on. – Craig Tullis Apr 06 '18 at 18:19
  • 19
    @Craig Many older libraries do, but I'd say it's considered bad design nowadays. As for versioning algorithms, you can do that with the full string in one column as well. – Anders Apr 06 '18 at 20:03
  • 8
    This is why I love stack exchange. Every day I learn something I didn't know I didn't know. – MooseBoys Apr 06 '18 at 22:16
  • 3
    It's not just from the security practice perspective that it makes sense. It still makes sense from a normalization perspective given the reasons you lay out. – jpmc26 Apr 07 '18 at 01:20
  • 4
    This also has the benefit of being future-proof; if you decide to change the hashing parameters later, old hashes in the database will continue to work _(as long as you use the library functions to compare hashes, like you're supposed to with Argon2, rather than directly comparing the hash-strings)_ – BlueRaja - Danny Pflughoeft Apr 07 '18 at 05:14
  • @Anders What library are you thinking of that supports versioning of the stored password with a single password field? I can see how to make it work and the approach has quite the charm to it, but I've never seen one in the wild. – Voo Apr 08 '18 at 21:18
  • 2
    @Voo Any library. All the info you need - algorithm, parameters, salt - is in the string that you store in a single column. – Anders Apr 08 '18 at 21:43
  • @Anders It's not really a blackbox any more if I have to define my own format for how to store the separate parts in a single string and have to convert between the tuple (algorithm, hash, salt) and the string myself. If anything that makes it way more likely to somehow introduce errors. I thought the whole idea was to avoid the developer having to deal with the separate parts? – Voo Apr 09 '18 at 06:58
  • 1
    @Voo Yes, the point is to not modifying the string. Never do that. I didnt mean to suggest that. I think we are misunderstanding each other. – Anders Apr 09 '18 at 07:25
  • 1
    @Voo spring-security version 5 is going in this direction : https://docs.spring.io/spring-security/site/docs/current/reference/html/core-services.html#pe-dpe . The only functionality i find missing, is that the library do not give the opportunity to the application to migrate the hashed password field to the default encoding scheme (if not using it) – Thierry Apr 09 '18 at 12:43
  • 1
    For those looking for an example of a library that transparently stores all the parameters, including the algorithm, in one string, and allows smoothly upgrading to stronger settings, check out [PHP's password_* functions](http://php.net/password), including the examples of [password_needs_rehash](http://php.net/manual/en/function.password-needs-rehash.php). – IMSoP Apr 09 '18 at 14:57
  • @Voo This versioning has to be triggered by the developer, but decoding and re-encoding is completely handled by the library. – MauganRa Apr 09 '18 at 16:33
  • @Anders I was interested in what libraries actually have an API that works with a single string instead of the usual triple. You do need some additional configuration in this case after all. The spring API is rather interesting to look through. – Voo Apr 09 '18 at 16:47
  • @Voo Another password encryption implementation that stores all three parts as a single string is [bcrypt](https://en.wikipedia.org/wiki/Bcrypt) with implementations for numerous languages. – Tyler Apr 10 '18 at 15:30
49

What you're missing is that hashes work on the original data, minus the original string. When you want to validate a string against a hash, you take the supplied string, plus the original hash data (cost, salt, etc) and you generate a new hash. If the new hash matches the old one, the string is then validated (in other words, the string is never decrypted, it is rehashed).

Having the salt doesn't help you brute force. The salt is an essential part of a hash in the same way tumblers are part of a lock. If you take either out, nobody can use it.

Separating storage is pointless. You'll have to reassemble the completed hash to validate it in most cases. That's why all the components are stored in one handy string by default.

Machavity
  • 3,808
  • 1
  • 14
  • 31
  • 11
    You expalined it much more elegantly than me, in fewer words. +1 – Anders Apr 06 '18 at 18:37
  • Yes, separating is pointless - you can't use one without the other, you never update one without the other, and they have identical security requirements, so treat salt+password as a single indivisible unit for storage. – Toby Speight Apr 10 '18 at 10:40
9

Yes, you can store it in a single field, and many databases/applications store the salt+hash in a single field/file etc.

The most famous is Linux (which isn't a DB), that stores the hash in the /etc/shadow file using the format:

"$id$salt$hashed", the printable form of a password hash as produced by crypt (C), where "$id" is the algorithm used. (On GNU/Linux, "$1$" stands for MD5, "$2a$" is Blowfish, "$2y$" is Blowfish (correct handling of 8-bit chars), "$5$" is SHA-256 and "$6$" is SHA-512,[4] other Unix may have different values, like NetBSD.

(source: https://en.wikipedia.org/wiki/Passwd)

The salt is not meant to be secret (or at least not more secret than the hash). Its primary purpose to make brute-forcing attacks much much harder since the attacker has to use a different salt for each individual user.

But your question is more nuanced -- because you're not just asking about salts but parameters as well. Things like the hashing algorithm, iteration count, and salt. In any case, don't store this in code, they still belong in the DB.

Imagine you've got a bunch of users, and you've used SHA1 as your hashing algorithm. So your database field would be something like SHA1:SALT:HASH.

If you wanted to upgrade your Database to BCRYPT, how would you do this?

Typically you'd deploy some code so that when a user logs on, you verify the password, and if valid -- you'd re-hash the password with a newer algorithm. Now the field for the user looks like this: BCRYPT:SALT:HASH.

But then some users would be on SHA1, and others on BCRYPT, and since this is at a user level, you need the parameters that tell your code which users are which to be in the Database.

In short, storing the parameters and hash in a single field is OK, but splitting them out for whatever reason (efficiency, easier code etc) is also OK. What's not OK is storing this in your code :)

TL:DR

Troy Hunt recently published a podcast suggesting that instead of migrating to BCRYPT in the manner above, it's more effective to simply take all the SHA1 hashes currently in the DB, and hash them using BCRYPT.

Effectively BCRYPT(SHA1(clear_password))

When a user logs on you'd

BCRYPT(SHA1(clear_password)) == <db_field>

This way, everybody on the platform gets upgraded at once, and you don't have a database with multiple hash formats for passwords. Very clean and very nice.

I think this idea makes perfect sense, but even though everyone migrates at once, it's not instantaneous. Unless you're willing you accept some downtime on the app (while you re-hash all the passwords), there will still be a small gap of time where some users are on BCRYPT and some on SHA1, hence your DB should still store the parameters of the hashing algorithm, and your code would execute based on it.

keithRozario
  • 3,631
  • 2
  • 12
  • 25
  • 5
    You know the [Why shouldn't we roll our own?](https://security.stackexchange.com/q/18197/114527) thing? Has `BCRYPT(SHA1())` been *proven* to not have that problem? (You didn't give a link so that I could easily investigate, and "makes perfect sense" is not the same as "it is true that.") – Andrew Morton Apr 07 '18 at 19:23
  • 2
    Well I did caveat the sentence with "I think" it makes perfect sense :). My understanding is that if BCRYPT works by making brute-force more expensive, then regardless of what the input is, whether it's a SHA1 or the cleartext it'll still be more expensive to the attacker to brute force. The BCRYPT(SHA1()) makes sense from a migration perspective. – keithRozario Apr 09 '18 at 01:42
  • 2
    Also I found this topic here, that suggest you some other use-cases for SHA-ing a password before BCRYPT-ing it. General consensus of the thread was that it's an OK thing to do but maybe we should start a separate question if you still disagree https://security.stackexchange.com/questions/61595/is-it-good-practice-to-sha512-passwords-prior-to-passing-them-to-bcrypt – keithRozario Apr 09 '18 at 06:13
  • 1
    @keithRozario "It is intuitively obvious that BCRYPT(SHA1(x)) is at least as secure as BCRYPT(x)" is not the same thing as "there is a security proof that ...". First rule of cryptography: never trust your intuition unless your name is djb or similar. – Martin Bonner supports Monica Apr 09 '18 at 07:24
  • 4
    @MartinBonner is correct. It *IS* intuitive but it's also wrong. Assume SHA1 is broken. Simulate this by having it return 1 for all inputs. BYCRYPT(SHA1(x)) is clearly less secure than BCRYPT(x) in this scenario, similarly consider SHA1(BCRYPT(x)). Depending on *how* SHA1 is broken chaining the hashes *could* remain valid. But it'll never be more secure than using just the more secure of the two. Since they ideally are affecting the same parameters in the security equation. *If* they had minimal interaction chaining could be useful, but that's a whole lot of math. Hence *proven* @ AndrewMorton – Black Apr 09 '18 at 14:44
  • 1
    @AndrewMorton Good discussion, but I'm not convinced. Your example of simulating broken SHA1 by having it return 1 for all inputs isn't correct, that's like saying if the user choose 1 for their password, then BCRYPT is broken. But I'm starting another thread because I think this is interesting. – keithRozario Apr 10 '18 at 02:43
  • 2
    the new thread is here : https://security.stackexchange.com/questions/183358/how-secure-is-bcryptsha1password – keithRozario Apr 10 '18 at 02:57
  • @keithRozario It was Black who wrote that. – Andrew Morton Apr 10 '18 at 07:51
4

From a security standpoint, it doesn't matter if the salt and hashed password are stored in a single field or separate fields, though I would lean towards a single field for simplicity. The only reason to separate them is if your application code needs to pass in the salt and password separately to the validation function. If your application code only requires a single combined string then you might as well store it that way.

As an example, the older ASP.NET Membership split the password hash and salt into two DB fields, and the newer ASP.NET Identity has the hash and salt in a single DB field, and in turn the newer functions have been modified to handle the single string as inputs and outputs.

TTT
  • 9,132
  • 4
  • 19
  • 32
4

Is it acceptable to store this in one field in a SQL database (varchar(99) in this case)

No. The size of the data might change at some time. A varchar(MAX) field should be used unless the specifications state that the size will always be exactly 99 and there shall be no variation from that ever. Let the database take care of the storage requirements.

2

To know if its more or less secure we must understand the purpose of a salt.

A salt serves a few purposes (that I can think of)

  1. Prevent Dictionary attacks

A dictionary attack is where a logical word is used as part of a password. The archetype being the word password or even P@ssW0rd. People are imperfect and it's easier for them to remember words then random strings of letters and numbers. A dictionary attack relies on this to shorten the path a brute force attack takes. Obviously if we add a random chunk of stuff to it, then it makes this type of attack impossible, or pointless and puts them back to brute forcing the whole thing.

If an attacker knows the salt used, they would have to edit all their dictionary files for each password, and that assumes they know how the salt is added to the original password. One big benefit of a dictionary list is say you take 1000 passwords and 1 million dictionary words. Then you cycle through them trying to get a few matches on weaker passwords. So to have to edit 1 million words 1000 times is just not really practical.

  1. Prevent Hash or Rainbow table attacks.

A Rainbow table is where hashes are pre-computed and stored in some kind of database. For example you could take MD5 and just start hashing random stuff and saving it. Then when you want to crack a password you just look up in the rainbow table.

If an attacker has the salt, they have to recompile the rainbow table for each salt used. The benefit of this attack is having that data compiled ahead of time and re-using it over and over. This puts them back to brute-forcing the password. By salting and salting all passwords differently that basically means you have to re-calculate the rainbow table for each salt. This is why it's important to use different salts fore each password.

Summery

None of these reasons really rely on the salt being secret. Obviously they are all less strong if an attacker knows the salt. But additional information they get will always help them. So I wouldn't say to go advertising it, but there is a saying about security based on obscurity ....

Disclaimer I am by no means a cryptological expert, but these are a few of the top reasons that come to my mind when I think about salts.

Hope that helps.

0

The salt is just used in order to block the rainbow tables attacks. Therefore your salt might be stored with the hashed password as it is done on most recent GNU/Linux system where the salt is stored in /etc/shadow.

maggick
  • 109
  • 3