Is Bcrypt a hashing algorithm or is my study material wrong?

Question

I'm currently studying for my Comptia Security+ exam and on a practice test online I got this question:

Or, represented as text:

▶ Which of the following are hashing algorithms? (Select all that apply)
----------------------------------
 ✔️ ☑️  MD5 (☑️ Your answer)
 ✔️ ☑️  RIPEMD (☑️ Your answer)
 ☐ ➖  Bcrypt (❌ Your answer)
 ✔️ ☑️  HMAC (☑️ Your answer)
 ✔️ ☑️  SHA (☑️ Your answer)
----------------------------------
➖ Your answer to this question is incorrect or incomplete.

Was I wrong to include Bcrypt as a hashing algorithm?

This practice yes is fundamentally confused, HMAC is definitely not a hashing function by any measure. HMAC requires a secret key as input parameter, hashing functions don't normally require a key. If HMAC is considered a hash, then PGP/RSA-signature would have been a hash as well. — Lie Ryan, Oct 16 '19 at 03:04
For those of us behind corporate firewalls, can you post the question as text, not an image? — Sean Reid, Oct 16 '19 at 09:02
@SeanReid Just for you, I did my best to convert it all to text. — Ismael Miguel, Oct 16 '19 at 10:56
@IsmaelMiguel You even got the mis-alignment correct on the Bcrypt answer checkbox. +1 — Greg Schmit, Oct 16 '19 at 19:50
@GregSchmit Thank you. As I said, I tried my best. I couldn't do the blackened square with a minus inside, but I got close enough. Honestly, the misalignment was an artifact, but it looks really close enough. — Ismael Miguel, Oct 16 '19 at 20:48
they should have included CRC32 and murmurhash in there just to make it interesting — Richie Frame, Oct 19 '19 at 01:56

Squeamish Ossifrage · Accepted Answer · 2019-10-16T13:43:55.813

85

It's not a good question. You're not wrong to call bcrypt a ‘hashing algorithm’, but they're not wrong that it is qualitatively different from the others—although it is curious that they single out bcrypt and not HMAC too. We can group them into three categories:

bcrypt is meant to be a password hash, also known as a password-based key derivation function, whose purpose is to be expensive to compute so that even if it doesn't take many trials on average for an adversary to guess what the input was, there's a high cost to testing each guess. Other examples include PBKDF2, scrypt, and Argon2, and Balloon hash.
MD5, RIPEMD, and SHA are meant to be collision-resistant hashes, whose purpose is to make it hard to find distinct messages that have the same hash. As it happens, MD5, RIPEMD, SHA-0, and SHA-1 (though not SHA-2 or SHA-3) are all broken—collisions have been reported for all of them. They're also designed to be preimage-resistant, so that it is hard to find a message having a prescribed hash, and second-preimage-resistant, so that it is hard to find a second message having the same hash as a prescribed message; none of them have had these properties broken. Other examples—which, unlike MD5, RIPEMD, and SHA-0/SHA-1, are unbroken to this day—include BLAKE2b and SHAKE128.
HMAC is a technique for constructing pseudorandom function families (PRFs) or message authentication codes (MACs) out of certain hash functions like SHA-256; sometimes HMAC is loosely called a ‘keyed hash’, but that doesn't really make it clear what the security goal is, which is unforgeability as a MAC or pseudorandomness as a PRF: if the key is chosen uniformly at random, an adversary who doesn't know the key can't guess the MAC for any message, and can't distinguish the hash of any message from a uniform random bit string.

The word ‘hash’ alone in cryptography just means a function that scrambles its input, or the scrambled output. There are other kinds of hash functions: the FNV-1 string hash function, sometimes used in hash tables; universal hash families like Poly1305, with guaranteed bounds on collision probabilities, sometimes used to make extremely cheap message authentication codes with high security; key derivation functions like HKDF, used to turn high-entropy but nonuniform bit strings like Diffie–Hellman shared secrets into uniform random bit strings; etc. It's not wrong to call any of these ‘hashes’, but it may be helpful to use more specific terminology like collision-resistant hash or pseudorandom function family or universal hash family.

edited Oct 16 '19 at 13:43

answered Oct 15 '19 at 22:51

Squeamish Ossifrage

2,646
9
17

16

I actually selected them all as they all used hashing in one way or another but it counted bcrypt as not a hash function. I was just curious about the distinction so thank you! Hopefully if the actual Security+ exam has something like this, it won't have some semi-ambiguous choices or wording. – treefidy Oct 16 '19 at 00:00
5

Minor technical quibble: A password hashing function and a PBKDF are not formally the same thing. In practice, good PBKDFs are usually also good password hashing functions, and in some cases good password hashing functions can also be good PBKDFs, but they are not generally interchangeable. – James_pic Oct 16 '19 at 11:07
@James_pic What is the distinction you're drawing? – Squeamish Ossifrage Oct 16 '19 at 13:47
3

@SqueamishOssifrage KDFs will generate keys for use in symmetric ciphers. There's no requirement that a password hashing function generates something useful as a key. In practice, the best constructions have both desired characteristics, but you can't in general safely assume that the two are interchangeable. – James_pic Oct 16 '19 at 13:57
1

@James_pic While strictly true that the _goal_ of verifying passwords does not necessarily require using a function whose outputs are indistinguishable from uniform random given high-entropy inputs, as a subroutine, in practice _all_ serious password hashes (not just the ‘best’ ones) work this way—for example, it was a design criterion for the [password-hashing competition](https://password-hashing.net/cfh.html). If you know a counterexample, I'm all ears—but maybe better to put a warning on the counterexample's documentation than here! – Squeamish Ossifrage Oct 16 '19 at 14:09
3

@Treefidy CompTIA is notorious for having blatantly wrong (or at least obtuse) questions on their exams. Don't sweat 1-2 questions that are unclear what they're asking. The brain dumps and practice exams are even worse. Source: I have an A, N, S, and CySA cert. – Monica Apologists Get Out Oct 16 '19 at 14:20
Bcrypt was never intended for use as a KDF, and its outputs are indeed distinguishable from uniform random - there are particular outputs that are not possible. In practice, it's probably good enough if someone were to use it it as a KDF, but it wasn't designed for this. – James_pic Oct 16 '19 at 14:34
@James_pic There are $2^{64} + 3\cdot 2^{96}$ strings that bcrypt cannot produce—specifically, those with repeated 64-bit blocks in the 192-bit output; _i.e._, it excludes less than $1/2^{94}$ of the output space. A distinguisher making $q$ queries has probability at most $q/2^{94}$ of finding one of those outputs in a uniform random function; otherwise a uniform random function into all 192-bit strings and a uniform random function into all 192-bit strings _except those_ are indistinguishable. While it may not have been intended this way, that's not a practically meaningful distinction. – Squeamish Ossifrage Oct 16 '19 at 15:34
To put that figure in perspective, an adversary has the same distinguishing advantage against an application that uses AES-CTR under a single key for a couple megabytes of data, yet we routinely do so for much larger data volumes. This doesn't strike me as a reason to draw a pedagogical distinction between password hashes and password-based KDFs. (Similarly, one might say a password-based KDF oughta support arbitrary-length output…but then that would exclude PBKDF2 from being a password-based KDF because while it technically can yield arbitrary length, the mechanism actually hurts security!) – Squeamish Ossifrage Oct 16 '19 at 15:52

score 35 · Answer 2 · answered Oct 16 '19 at 00:47

Strangely enough, two of those things are not like the others, but apparently it only marked one of them as wrong.

All of the options above are arguably hash functions, in that they all take some input message and produce a "digest" or "hash": a fixed-length set of high-entropy bits that is deterministic on its inputs but is not reversible. There are differences in both inputs and usage between them, though.

MD5, RIPEMD, and SHA (now usually called SHA1) are all very much the same class of function, taking just one input (an arbitrary message) and quickly producing the digest of that message (which, at least in theory, possesses collision and pre-image resistance). Their particular characteristics (both mechanically, such as length of digest, and practically, such as security) vary, but at a high level they are interchangeable.

Bcrypt takes two additional parameters: a salt, and a cost/work factor. Additionally, it is intended for hashing only short strings (passwords or similar), and constrains the input length to 72 useful bytes. It produces output like a standard hash function, but (possibly much) more slowly.

HMAC is a construction that requires an externally-defined hash function. As a function, HMAC requires inputs of a message, a salt, and a hash function to use. Unlike bcrypt, it does not introduce any constraints on the input message - in this way, it is more like the other three - but unlike all four of the others it does not actually specify an algorithm for computing a digest, delegating that operation to the specified hash function.

In summary, you could make a case for any of the following answers:

All five are hash functions, because they take a message and produce a one-way, fixed-length, deterministic, high-entropy digest.
Bcrypt is the only one that's not a hash function, because it cannot operate on arbitrary-length messages.
HMAC is the only one that's not a hash function, because it doesn't define a procedure for computing a digest.
Only MD5, RIPEMD, and SHA are true hash functions.

Evidently, your study material takes the second view. I personally lean more toward the third or fourth view. HMAC-SHA-256 is a hash function (that differs from the "pure" ones by requiring a key), but HMAC by itself isn't a hash function. It would be like saying "obtain flour, salt, water, and a person who knows how to make tortillas; give the flour, water, and salt to the person; tell the person to produce a tortilla" is a tortilla recipe.

score 5 · Answer 3 · answered Oct 16 '19 at 04:31

5

One can find many "certification" questions that are less than ideal trying to exploit nuances of wording to test understanding of the topic, but failing to have any real meaning in a practical sense. While you may validly be able to argue the point in the reality of the real world, it really doesn't matter in the face of what the certification test is trying to assess.

Coming from my network background, I could easily find many examples around "classful" networking. This is part of the reason certification questions are generally off topic on the Network Engineering stack. It seems pointless to answer why Test X indicates that 10.250.0.0/255.255.0.0 is not a valid network (answer - a "Class A" network has a mask of 255.0.0.0) when it really has no bearing on running a network in today's world.

Also not sure why the popular answers are fixated on "hashing functions". The certification question asks, "Which of the following are hashing algorithms?" (emphasis mine). There is no mention of hashing functions.

So, of the five provided, four can be considered algorithms in their own right, whether they have their own hashing functions built into them. On the other hand, bcrypt (while a function) uses an algorithm based on blowfish, it is not an algorithm unto itself.

It may be a stupid distinction, but again it is a certification question.

answered Oct 16 '19 at 04:31

YLearn

3,967
1
17
34

1

What is the distinction you're drawing between ‘hashing function’ and ‘hashing algorithm’? Why would using Blowfish as a subroutine disqualify something from being an algorithm? – Squeamish Ossifrage Oct 16 '19 at 13:56
1

@SqueamishOssifrage, both provide a set of operations to follow, but a function is typically considered more of a programmatic device while an algorithm is considered a mathematical device. It may be a pretty thin distinction, but keep in mind it is a certification question. It is roughly the same as saying something like “I took a car to the airport” and then drawing out the difference between driving your own car or taking a taxi/limo/etc. Sure, they may be “different” but in any practical sense they are really the same. Certification questions often draw such trivial distinctions. – YLearn Oct 16 '19 at 14:26
There are certainly a few different algorithms that can implement (say) SHA-256, all computing the same function with slightly different logic by exploiting certain bitwise operation identities—but you specifically said bcrypt _is not_ an algorithm unto itself. Why are you making that distinction for bcrypt but not for MD5? – Squeamish Ossifrage Oct 16 '19 at 14:57
@SqueamishOssifrage, I am simply trying to provide the answer to how the certification test is attempting to draw this distinction on an *entry level* certification. In their simplistic stance, MD5 uses MD5 where bcrypt uses Blowfish. That's all there is to it. I am not arguing that they are correct in drawing this distinction or even using terminology properly. But in the context of the certification in question, that is the difference. – YLearn Oct 17 '19 at 02:48
What is the distinction you're suggesting that the (obviously a bit silly) certifiers are making? That a function which is defined in terms of another function cannot be an algorithm? I'm struggling to see how the distinction you hypothesized applies to bcrypt but not to MD5 or SHA-256, which are defined in terms of internal compression functions. It seems more likely to me that they meant bcrypt is not designed to be collision-resistant (which doesn't really explain HMAC but one might suppose they meant HMAC with a collision-resistant function inside). – Squeamish Ossifrage Oct 17 '19 at 02:52
@SqueamishOssifrage, clearly you don't deal with people taking entry level certification tests very often. The questions on these tests often make little sense to someone knowledgeable in the field. As a network professional, I can tell you I personally am sick of questions people new to networking ask about classful networking since it has been *obsolete for over 20 years* yet questions based on this concept are still components of current entry level networking certifications (Network+, CCENT/CCNA R&S, etc) and their associated study materials. It doesn't make sense, it just is. – YLearn Oct 17 '19 at 03:28
To be clear, I don't mean to criticize you for a silly distinction that the authors of the certification tests are making. I think it is fair to do what you seem to be doing—to reverse-engineer what the test authors meant, silly as it may be. I just don't understand the distinction you have reverse-engineered. My suspicion is that they meant to ask about _collision-resistant hash functions_, which bcrypt is not designed to be; but you seem to think it's about ‘algorithm’ _vs._ ‘function’, and I'm struggling to see how _even a dimwitted test author_ could see bcrypt as ‘not an algorithm’. – Squeamish Ossifrage Oct 17 '19 at 03:32
@SqueamishOssifrage, I think you are overestimating the test authors when it comes to anything but surface nuance. Any number of references state that Bcrypt uses Blowfish. I doubt they dig any deeper than that. It is in my mind no more idiotic than [the question referenced int his post](https://networkengineering.stackexchange.com/q/62949/33) as an example. People can argue back and forth about why/why not the answer is correct, but in the end you need to learn it for the certification the way *they* view it, not the way that anyone more clued in would answer it. – YLearn Oct 17 '19 at 04:08

kelalaka · Answer 4 · 2019-10-16T14:00:26.813

1

From Wikipedia:

bcrypt is a password hashing function designed by Niels Provos and David Mazières, based on the Blowfish cipher, and presented at USENIX in 1999.

Note: Password hashing algorithms need not be collision-resistant, see Thomas's answer. That is the distinction from Cryptographical Hash algorithms.

edited Oct 16 '19 at 14:00

answered Oct 15 '19 at 22:51

kelalaka

5,474
4
24
47

I selected bcrypt as correct and was marked wrong. From your citation it is a "hashing function" so all answers should be correct, right? – treefidy Oct 16 '19 at 00:02
1

It is a **password hashing function**. – kelalaka Oct 16 '19 at 06:12
Why shouldn't password hashing algorithms be collision resistant? If it's easy to find dozens of passwords that have the same hash, or easy to find a string that matches a given hash, then it's easy to brute force a password login. – Rup Oct 16 '19 at 13:58
1

@Rup In password cracking, given a hash value (from a stolen database, e.g.), you are looking for a pre-image. See the pre-image, 2nd pre-image and hash definitions [here](https://stackoverflow.com/a/52957495/1820553). In collision resistance it is computationally infeasible to find any two distinct inputs `x`, `x'` which hash to the same output, i.e., such that `h(x) = h(x')`. The collision that you find may not be the hash of a password. And generic attacks on Cryptographic hash functions has sqrt(2^n) complexity where n is the output size. Also, look for the link in the answer. – kelalaka Oct 16 '19 at 14:08
3

@Rup Finding a password that matches a prescribed hash (finding preimages, or password-cracking) is _much harder_ than finding a _pair_ of passwords that happen to have the same hash (finding collisions), for the same reason that if you want to find someone whose birthday is January 1st you have to get a lot more people in the room than you need if you just want to find two people who have the same birthday with no restriction on what that birthday is. Finding a collision would only let you log in to your _own_ account with two different passwords; it wouldn't let you log into anyone _else's_. – Squeamish Ossifrage Oct 16 '19 at 14:16

Is Bcrypt a hashing algorithm or is my study material wrong?

4 Answers4