11

I was reading an answer by Terry Chia to another question about the requirements for a salt, in which he/she (among others) specified that salts need to be globally unique, but don't go into any detail on why. This was defined as "not only unique in the context of your database, but unique in the context of every single database out there". For example, I have been using an email address as a salt for my experiment-y sandbox website/database. These could certainly be being used as salts on other websites.

It doesn't make sense to me because, as I understand it, the purpose of the salt is to ensure that "cracking multiple compromised password hashes together should be no easier than cracking each one of them separately" (Ilmari Karonen, further down on the same post). Unless I'm mistaken, this constraint is satisfied as long as my salts are unique within my database.

I am further confused because generating globally unique salts sounds impossible for me to control. How am I supposed to be able to guarantee that none of my salts have ever been used on someone else's site, with the millions of sites out there? To be fair, I haven't looked into how to generate random salts yet; maybe the idea is that they are created from a large enough pool that even considering the number of sites out there, duplicate salts generated for separate databases is very unlikely?

I am definitely biased towards using an email address as my salt, since it's one less small step of work, but given the number of people who seem to disagree with this approach, I recognize that I am probably wrong. And besides, I want to know more about why I'm wrong because it doesn't make sense yet.

Char Star
  • 113
  • 4
  • It sounds like he misspoke or you misunderstood him. A "pepper" is globally unique, whereas a salt is unique for each hash. As for using an email as a salt... just look at the issue with router default names (since router ESSIDs are used as WPA2 salts). The issue is that someone could precompute a rainbow table just by knowing your email, long before they ever break into your database. – forest Nov 29 '17 at 05:07
  • 3
    @Baal-zebub you are interpreting "globally unique" oddly. Peppers are "globally applied" within the scope. The linked answers stated that salts should be random, unguessable, and therefore 'unique' from any other salt on any database, i.e. "unique globally" – schroeder Nov 29 '17 at 10:33
  • That's why I said the OP might have misunderstood him. It could easily be that I got the English wrong. To me I would think of there being globally unique, or there being individually unique. But anyway that confusion from me is why I commented instead of answered. :) – forest Nov 29 '17 at 10:41
  • salts should be random, you should not use anything easily guessable – Patrick Mevzek Nov 29 '17 at 19:35

3 Answers3

14

The goal of salting is resistance to precomputation.

While you can't guarantee that a salt is globally unique, you can (just as you said) generate salts of sufficient size as to make it infeasible to generate a rainbow table in advance (making them "unique enough").

More generally, if you're asking about how to create a salt, it sounds like you're rolling your own hashing method, which may be educational but is not generally recommended.

Instead, use an existing hash (such as PBDKF2 or bcrypt) (or even better, as CBHacking rightly suggests in the comments, scrypt or Argon2). Not only will you automatically inherit their existing salting methods (which are quite sufficient), but your experimentation will be aligned with password storage best practices ... which would probably be better practice.

Update 2018-04-03: Steve Thomas (sc00bz) makes a good point here that Argon2 and scrypt are actually worse for the defender when used at speeds generally considered to be compatible with authentication at scale (<0.1 seconds), while being better for the defender for speeds compatible with encryption of individual files (1 to 5 seconds). (This means that the gap from 0.1 seconds to 1 second is probably interesting, and bears testing for your specific environment). In other words, his assertion is that if you try to tune Argon2 or scrypt to <0.1s speeds, the results are less resistant to cracking than bcrypt is at those speeds.

In the same thread, Aaron Toponce also makes a great suggestion for working around bcrypt's length (72 characters max) and pre-hashing limitations, by using bcrypt(base64(sha-256(password))). (See also Thomas Pornin's similar answer here). The reason that the base64 is needed is also interesting - don't leave it out.

So if your use case is scaleable, cracking-resistant authentication with a fast authentication experience, bcrypt(base64(sha-256(password))) is probably a good approach. (And you'll need to adjust bcrypt's work factor to the highest value that fits within your authentication speed window). If your users can tolerate waiting a second or more to authenticate, Argon2i or scrypt may be better. And note that relative performance will shift over time, as hardware capabilities improve - so increasing the bcrypt work factor every X months for new hashes (as Dropbox does) would allow the approach to adapt to future capabilities over time.

Update 2019-09-01 It turns out that wrapping a faster hash in a slower hash without salting/peppering that faster hash first is dangerous, because fast uncracked unsalted hashes from another source (such as another leak) can be tested in bulk against the wrapped hashes without having to crack those faster hashes first. The faster hashes can then be attacked in a separate job at a much higher rate. So my advice above should be updated to bcrypt(base64(sha256(password.salt))).

Royce Williams
  • 9,318
  • 1
  • 32
  • 55
  • scrypt (or the new argon2, which was specifically designed and tested for this purpose and is the winner of a three-year competition) are better choices than PBKDF2 or bcrypt. While bcrypt's memory requirement is higher than PBKDF2's, it's not tunable and is still very low by modern standards; parallelizing it still isn't that hard. The newer algorithms are fully tunable. – CBHacking Nov 29 '17 at 08:09
  • Bcrypt is tunable. It has "cost factors" that let you tune the CPU trade-off. It's not as versatile as argon2 which has separate memory and CPU cost options. – forest Nov 29 '17 at 10:43
  • 2
    @CBHacking, while I agree that argon2 and scrypt are superior, I also believe that PBKDF2 and bcrypt are more widespread, and more likely to be accessible to this questioner and their platform. – Royce Williams Nov 29 '17 at 13:57
  • @forest I never said it wasn't, I said the newer ones are *fully* tunable, and that bcrypt's memory usage isn't high enough (implying, correctly, that it isn't tunable in that regard). – CBHacking Nov 29 '17 at 21:25
  • `While bcrypt's memory requirement is higher than PBKDF2's, it's not tunable and is still very low by modern standards` is what I saw. Doesn't matter though, I get your point. – forest Nov 29 '17 at 21:26
  • @RoyceWilliams, I'm sure somebody out there has a counterexample but I can't think of a single platform I've seen (and I've seen many) that had a bcrypt implementation but not either scrypt or argon2; they were very widely ported after release. In any case, it is irresponsible to simply assume the questioner couldn't access the newer algorithms, and only tell them about the outdated ones. – CBHacking Nov 29 '17 at 21:26
  • @forest the antecedent of "it" in that sentence is obviously the memory cost. The rest of the sentence doesn't even make sense otherwise. – CBHacking Nov 29 '17 at 21:28
  • 1
    @CBHacking, I appreciate your perspective. I think that "irresponsible" is a bit strong, given the vast gap that exists between "roll your own crypto" and bcrypt/PBKDF2/Argon2/scrypt. bcrypt and PBKDF2 are very well-baked and have been tested in the field for years, which is not an uncommon standard for crypto-related functions. The assertion that Argon2 and scrypt are just as accessible as bcrypt and PBKDF2 is harder when legacy/older systems are in play. I stand by my original recommendation, while at the same time also welcoming scrypt (of which I am a long-time fan), and Argon2. – Royce Williams Nov 29 '17 at 22:52
  • 1
    bcrypt and PBKDF2 are not outdated, they are simply suboptimal. – Stephen Touset Dec 01 '17 at 01:31
  • An interesting March 2018 Purdue economic analysis that demonstrates the superiority of memory-hard functions, and recommends deprecating PBKDF2 and bcrypt: (cc @CBHacking): https://csdl.computer.org/csdl/proceedings/sp/2018/4353/00/435301a035.pdf – Royce Williams Mar 31 '18 at 17:47
4

There is no advantage in using the e-mail address instead of generating a random value. You have to store it anyway like the random salt, otherwise the user can never change the e-mail address.

The global uniqueness is not a requirement, you don't need to guarantee the uniqueness, but the more globally unique the salts are, the better. The possible combinations for a 128-bit salt (BCrypt) are 3E38. Even if you generate 1000 salts/sec one would expect about 6E8 years to wait for a 50% chance of a duplicate.

As Royce already mentioned, todays algorithms already generate the salt for you, and store it plaintext in the resulting hash-string, so there is no need for special handling in the database (just 1 field for the hash).

When using email addresses, an attacker could pre calculate rainbow tables for certain emails of interest.

While the disadvantages are not fatal in practise, there are simply no advantages, so why would you choose a more complicate and unsafer method to generate the salt? Use a modern password hash function and you are good.

martinstoeckli
  • 5,189
  • 2
  • 27
  • 32
0

This topic becomes much clearer if instead of saying that salts "need" to be globally unique, we observe instead that optimal security is achieved if they are.

Consider the following simplistic attack model: an attacker obtains n password entries, and they have the resources to make m tests in total. Now consider the extreme cases:

  • If the passwords are unsalted (or all have the same salt), then each of the m computations can be checked against all n entries. Which means they get to test m * n user/password hypotheses.
  • If all the passwords are salted with unique salts, then each of the m computations can only be checked against 1 entry. Which means they only get to test m user/password hypotheses.

If some but not all salts are duplicates, the attacker gets something in between, proportional to the number of distinct salts. But this doesn't mean that salting fails catastrophically on salt duplication, but rather that its effectiveness degrades linearly.

Since sufficiently large random salts are generally easy to implement and achieve global uniqueness with all but negligible chance against, there's really very little reason not to go that way.

Random salts have another important property, which martinstoeckli's answer to the question you link points out: the attacker cannot predict ahead of time what salts you will use. Your proposal to use email addresses fails that criterion.

Luis Casillas
  • 10,361
  • 2
  • 28
  • 42