TL;DR
- If your data is a long message, or has at least 72 bits of entropy, use SHA-256.
- If your data is a password use BCrypt, adjusting the work factor to take about 100ms.
- If the input data has too little entropy, hashing (even with BCrypt) will not provide significant security.
- weak
password
s
- all-digit PINs
- banking account numbers
While it is hard to list all the hash routines, it is easy to list the most common routines, and even easier to recommend the one you should be using.
The SHA-2 family of hashes (SHA-256 through SHA-512) are considered strong General Purpose hash routines. SHA-256 works best for most purposes.
MD5 is quite weak, and SHA-1 is acceptable. While some folks may desire MD5 for being shorter (128 bits instead of 256), you are actually better off truncating a modern hash. Obviously it is better to use the full hash length of 256 bits.
Keep in mind, General Purpose hash routines such as MD/SHA are designed to be fast. For most computer programs, fast is considered good.
However, if the original input value has limited entropy (for example a 4 digit pin), then it will be very easy to brute-force (try all 10,000 possible input values and compare them to) the hash, thereby determining the otherwise secret data.
- 4 digit PIN = 13 bits of entropy (cannot be made secure enough by hash)
- 18 character truly random Hex string = 72 bits entropy
- 12 character truly random Base64 string = 72 bits entropy
- 8 character
password
from a lazy user = almost no entropy
- 11 character password with some common substitution tricks
Tr0ub4dor&3
= roughly 28 bits of entropy
- credit card number = not enough entropy so don't store even its hash
- email message or any large text file = lots of entropy
Every bit of entropy means it takes twice as long to brute-force that data.
How long it takes to brute-force depends on speed of attacker hardware, and how fast the hash is. So,
If your input data has 72 bits of entropy or better, just use SHA-256.
If your input data has less entropy, or unreliable entropy (user-provided passwords), then you should use a Slow Hash.
Slow Hash routines are adjustable, so that instead of completing the hash operation in a few microseconds, it takes several milliseconds (I recommend about 100ms) on your production hardware. (note that attacker hardware will probably be much faster)
Here are some good Slow Hash choices.
Each has a means to adjust the processing time. (strength of the hash)
Repeat your SHA-256 (or SHA-512) hash many many times.
This is straight-forward to implement, and considered a reasonable technique. (despite the fact that SHA was designed to be fast)
BCrypt (commonly recommended slow hash)
SCrypt (newer (less proven), designed to be GPU-resistant due to RAM requirements)
PBKDF2 (older, good alternative to BCrypt)
Note: Optimization of your hash function is important. Use the natively compiled version of BCrypt, not one 'written in' a high level language (i.e. JBCrypt was written in Java), as the Natively Compiled version (written in C/C++, with proper linking to your high-level language) will be more efficient, therefore allowing you to compute a higher (stronger) work factor in the same amount of time.
It is common to add Salt to a hash. This is unique, but not secret, and is added to the password before the hash is generated. In this way, if an attacker steals your database and runs brute-force on all the hashes, he will have to run a separate brute for job for each Salt used, which will take him quite a bit longer than brute-forcing all the passwords in a single job.