6

I know this is a little bit broad, but I'd like to understand the function of this algorithm by explaining like I'm 5. And I'd like to know the difference between BCrypt and blowfish, is it because BCrypt is using a salt?

I've been reading but I see information in a low level, and I don't want go that deep, I'd like to get a simpler explanation (knowing the basics like salt, bits, etc).

For example, I know that the first part of a password is :

$2$: Blowfish-based crypt ('bcrypt')

And then second one is the rounds. For example, $10$ indicates 2^10 key expansion rounds, but what does it exactly mean?

An example answer would be:

  1. First you make the salt with algorithm X
  2. Then get a random number of X
  3. Then with the round you do X

Something like this, hope it's clear.

schroeder
  • 125,553
  • 55
  • 289
  • 326

2 Answers2

9

Hashing versus encryption

The biggest different between bcrypt and blowfish is that blowfish is a symmetric encryption algorithm and bcrypt is a hashing algorithm. (It is a special kind of hashing algorithm, but we will get to that later.)

Encryption (what Blowfish does)

A symmetric encryption algorithm takes a secret key and some plaintext (the term "message" is regularly used and the variable m is typically used to refer to it) and outputs ciphertext, c. The idea is that the ciphertext can only be transformed back into the original message with same[^same] secret key that was used when encrypting.

[^same]: I am giving approximately true answers here to focus on the central ideas and concepts. I am hoping that those who recognize where my answers aren't precisely true will cut me some slack for expository purposes.

So Alice might encrypt the message "Attack at dawn" with the key 13 (this is not blowfish, but it is symmetric encryption) and produce the ciphertext "Nggnpx ng qnja". She can send that ciphertext to Bob, and if Bob knows that the key is 13, he can transform the cipher text back into the original messages. Sometimes Alice doesn't want to send a message to another person, but she wants to store it for herself to read at a later time. We can think of those cases as Alice sending a message to future Alice.

With Blowfish (instead of the encryption that I used in my example), it is effectively impossible for anyone who doesn't have the key to learn anything about the original message (other than its length and existence) from the ciphertext. Just an aside, AES is a preferable symmetric encryption algorithm over Blowfish, but I will continue to use Blowfish in my descriptions as that is what was asked about.

Hashing

A cryptographic hash algorithm (bcrypt is special kind, with some extras, but I will start with a simpler case) does not take a key and is not practically reversible. So a cryptographic hash algorithm like SHA256 would take a message (let's say a password) like "JaguarsRule!" and transform it into a hash.

$ echo -n 'JaguarsRule!' | shasum -a256
13f26e5a60e402d895c5ce1d9492d080563c5079a8b5f52a25953fd24a2cb1da  

One of the several properties of a cryptographic hash is that you can't feasibly compute "JaguarsRule!" from 13f26e5a60e402d895c5ce1d9492d080563c5079a8b5f52a25953fd24a2cb1da. There is no key in this case.

So a straight forward hashing mechanism like this might seem like a good way to check passwords on a server. Suppose you are running some web service that has people log on. You have a user, let's say with username jason who sets up his account with the password JaguarsRule!. Because you don't want anyone who can read what is on your server to learn Jason's password, you don't store the password itself. Instead you store its hash.

When jason (or someone pretending to be jason) logs in your server takes the password that gets entered and performs the hashing operation on it and compare that hash that is stored. If the hashes match then the system knows that the correct password was entered.

Salts and slow cooking

The scheme described above, using a general cryptographic hash like SHA256, would work great except for one thing: People pick guessable passwords. If people were as likely to use passwords like wZFoO_SKrgEw as they are to JaguarsRule! then we wouldn't need specialized hashing functions like bcrypt. But people, being human, are not going to pick passwords randomly and uniformly over a sufficiently large space. Instead, they are far more likely to pick some possible passwords over others.

Salt

Consider another user of your system, pillboy. Pill Boy is also a fan of the Jacksonville Jaguars and happens to have also picked JaguarsRule! as his password. Using (unsalted) hashing, the hash for Pill Boy's password will be the same as the hash for Jason's password. An attacker only needs to compare hashes to see that those two users have the same password.

Furthermore, if an attacker discovers one way or another that Jason's password is JaguarsRule!` then she immediately knows Pill Boy's password. Furthermore, she might create a big list of hashes for the most common passwords.

The solution to this is to salt the password with something unique before hashing it. So when first creating a the hash of Jason's password (when Jason first signs up and creates the password), you'd add some random stuff to it. Suppose that at this time your system creates the salt 9cb3779f69e271c1. It then sticks that in front of the password before hashing, so it is really hashing something like 9cb3779f69e271c1JaguarsRule! (This is not how the salt actually gets added, but this is close enough to explain what salt is for.). The hash of 9cb3779f69e271c1JaguarsRule! will be completely different than the hash of JaguarsRule!.

The system will need to know what the salt was when it comes time to verify a password. So it will store the salt and the hash along with the Jason's username. When Pill Boy creates his account, it will be set up with a different randomly generated salt, and so his salt and password combination will produce a different hash.

So what you store on your server might look like this

[
    {
        "algorithm": "sha256",
        "username": "jason",
        "salt": "9cb3779f69e271c1",
        "hash": "37a932267ed055facf03cc5d09ca90f927a1eed47a8cd4856e57cd67434426be"
    },
    {
        "algorithm": "sha256",
        "username": "pillboy",
        "salt": "0a19471dab710025",
        "hash": "4be637a8c85455dce0cdc1c7670f062764100276b6ed64141c06fbef4578f185"
    },

]

By having different salts for these users, we can't see that they have the same password, and the attacker won't be able to compare the hashes to a list that she's precomputed of hashing of common passwords.

So bcrypt, because it is designed for password hashing, takes a salt.

Taking it slow

Salting is absolutely essential, but it still doesn't stop an attacker who gets hold of what is stored on the server from automated guessing. She can take the salt and combine that with common passwords and compute hashes until she finds a match. She might be able to perform tens or hundreds of millions of password guesses per second.

One of the reasons that she can compute some many guesses so quickly is that cryptographic hash algorithms like sha256 are designed to be fast and efficient while still meeting their security properties. So things like bcrypt do are "slow hashing". The idea is so that getting from some message (salt and password) to a hash might take so much computation that a typical computer will take a quarter of a second to perform the hash.

The user won't notice a quarter of a second delay, but if the attacker is using the same kind of computer, then she ends up being reduced to making four guesses per second instead of 10 million. (In practice, she will have specially configured computing equipment, but slowing her down from millions of guesses per second to tens of thousands of guesses per second is a good thing.)

bcrypt is one of several "slow hashes". Others are PBBKDF2, scrypt, and Argon2. Those last two not only require a lot of computation to create the hash, but also require a lot of memory.

Bcrypt is great, but ...

Salting and slow hashing is something that every password hashing system should do. But it also yields diminishing returns. There is just so much that slow hashing can do for the defender, and so we still need to (as long as passwords exist) get people to use harder to guess ones. I've written more about that in Bcrypt is great, but ...

NB: Thank you https://security.stackexchange.com/users/6253/schroeder for correctly deleting my previous answer in which I took "like I'm 5" literally. And my apologies to the original poster for having been snarky instead of helpful.

Jeffrey Goldberg
  • 6,420
  • 17
  • 21
  • Have an updoot :) – schroeder Mar 27 '19 at 20:13
  • I saw that someone posted an answer, but when I tried to take a look it disapeared, I didn't read about it... so I don't know what was the answer, but anyways, this answer is amazing, and explained. Thank you – Skizo-ozᴉʞS ツ Mar 28 '19 at 10:58
  • 1
    Thank you @Skizo-ozᴉʞS. My original answer was correctly deleted by a schroeder, a Moderator, because it was obnoxious and unhelpful. So I tried to make up for it and make a good faith effort at responding to the intent of "like I'm 5". – Jeffrey Goldberg Mar 28 '19 at 16:12
8

Bcrypt uses Blowfish symmetric-key block cipher and accepts 3 parameters; cost, salt, and password.

  • The cost is determined by the system level so that the admin can decide the timing of password search attack, see hashcat. It determines the number of iterations as iter= 2^cost where cost is between 2 and 31.

  • The salt is a random 16-byte. It can be generated any good random source as \dev\urandom

  • The password is the user's password to be processed. The size of the passwords is 1 to 72 bytes.

  • The output of Bcrypt is 24 bytes.

The Bcrypt encrypts the text OrpheanBeholderScryDoubt in a chained ECB mode.

 state = EksBlowfishSetup(cost, salt, password) //Special key scheduling 
 ctext = "OrpheanBeholderScryDoubt"
 repeat (64)
      ctext = EncryptECB(state, ctext)
 return Concatenate(cost, salt, ctext)

update on the cost

EksBlowfishSetup ( Extensive Key Setup ) shortly as follows;

state = InitialState()
state = ExpandKey(state, salt, password)
repeat (2^cost)
    state = ExpandKey(state, 0, password)
    state = ExpandKey(state, 0, salt)

The cost plays a role on the above function of BCrypt. With the cost increased the repeat loop will run exponentially. This will increase the attack time as you can see from this answer's table. If you want to compare it with hashcat see a benchmark for Tesla K80

Note: that Bcrypt is old (1999), you should use Argon2i as the winner of the password hashing competition.

kelalaka
  • 5,474
  • 4
  • 24
  • 47
  • Can you explain to me how the cost works? I mean, it does 2^cost times the iterator and each iterator is genering a random value? – Skizo-ozᴉʞS ツ Mar 27 '19 at 14:07
  • @Skizo-ozᴉʞS updated the answer. – kelalaka Mar 27 '19 at 16:19
  • 1
    Bcrypt and PBKDF2 are the same age, so I don't understand the comment about PBKDF2 being a better alternative. – PwdRsch Mar 27 '19 at 16:50
  • 2
    @PwdRsch Yes, you are right, updating the answer. – kelalaka Mar 27 '19 at 16:55
  • 1
    If we are talking about comparing these things, bcrypt is almost always going to be the better choice than PBKDF2 irrespective of age. Likewise, people should use AES over Blowfish for symmetric encryption, not because of age, but because of other technical differences. – Jeffrey Goldberg Mar 28 '19 at 16:29