43

There seem to be many different 'kinds' of entropy. I've come across two different concepts:

A) The XKCD example of correcthorsebatterystaple. It has 44 bits of entropy because four words randomly chosen from a list of 2048 words is 4 * log2(2048) = 44 bits of entropy. This I understand.

B) The Shannon entropy of the actual string i.e. the entropy is calculated based on frequencies of the letters/symbols. Applying the Shannon formula on correcthorsebatterystaple the result is 3.36 bits of entropy per character.

# from http://stackoverflow.com/a/2979208
import math
def entropy(string):
        "Calculates the Shannon entropy of a string"

        # get probability of chars in string
        prob = [ float(string.count(c)) / len(string) for c in dict.fromkeys(list(string)) ]

        # calculate the entropy
        entropy = - sum([ p * math.log(p) / math.log(2.0) for p in prob ])

        return entropy

print entropy('correcthorsebatterystaple')
# => 3.36385618977

Wikipedia only adds to my confusion:

It is important to realize the difference between the entropy of a set of possible outcomes, and the entropy of a particular outcome. A single toss of a fair coin has an entropy of one bit, but a particular result (e.g. "heads") has zero entropy, since it is entirely "predictable".
-- Wikipedia: Entropy (information theory)

I don't quite understand the distinction between the entropy of the toss (generation) and the entropy of the result (the string).

  1. When is B used and for what purpose?
  2. Which concept accurately reflects the entropy of the password?
  3. Is there terminology to differentiate between the two?
  4. True randomness could give us correctcorrectcorrectcorrect. Using A we still have 44 bits. Using B the entropy would be the same as that of correct. When is the difference between the two important?
  5. If a requirement specifies that a string needs to have 20 bits of entropy—do I use A or B to determine the entropy?
techraf
  • 9,149
  • 11
  • 44
  • 62
mds
  • 533
  • 1
  • 4
  • 5

6 Answers6

28

The Wikipedia article explains mathematical entropy, which isn't identical to what people mean when they talk about password entropy. Password entropy is more about how hard it is to guess a password under certain assumptions which is different from the mathematical concept of entropy.

A and B are not different concepts of password entropy, they're just using different assumptions as how a password is built.

A treats correcthorsebatterystaple as a string of English words and assumes that words are randomly selected from a collection of 2048 words. Based on these assumptions each word gives exactly 11 bits of entropy and 44 bits of entropy for correcthorsebatterystaple.

B treats correcthorsebatterystaple as a string of characters and assumes that the probability of any character to appear is the same as it is in the English language. Based on these assumptions correcthorsebatterystaple has 84 bits of entropy.

So which definition you use really depends on what assumptions you make about the password. If you assume the password is an XKCD-style password (and that each word indeed has a chance of one in 2048 to appear in the password) then A is the correct way to calculate entropy. If you don't assume the password is built as a collection of words but do assume that the probability of any character to appear to be equal to the probability of it's appearance in the English language then B is the correct way to calculate entropy.

In the real world none of these assumptions are correct. So if you have a "requirement that specifies that a string needs to have 20 bits of entropy" and this is for user generated passwords it's very difficult to give a precise definition of entropy. For more on this see Calculating password entropy?.

If, on the other hand, you can use computer generated strings (and are using a good PRNG) then each alphanumeric character (a-z, A-Z, 0-9) will give almost 6 bits of entropy.

David Wachtfogel
  • 5,522
  • 21
  • 35
  • 5
    Entropy is a property of generation method. If you your *generation method* has x bits, the attacker can't attack better than x bits. – PyRulez Apr 14 '15 at 23:04
  • @PyRulez citation needed? Wikipedia lists entropy as a measurable property. I am no authority on the formal theory/terminology of entropy. In the practical context of password attacks, entropy represents how difficult it is to crack a password, so how much the attacker knows about the password is very significant. – AMADANON Inc. Feb 09 '23 at 19:55
23

What it means

Coin toss entropy assumes that from one toss to the next, the result of the previous toss will not affect the result of the next toss. So, each toss adds one bit of entropy.

Shannon entropy assumes that the value of the next letter is in fact partially determined by the value of the previous letter (and perhaps others). Facts like "h" often follows "t" and "e" often follow "h" are taken into consideration so common patterns are assigned a lower entropy value. So with an english dictionary, the string the would have a much lower Shannon entropy value than the string exu.

What it means to you

The direct implication of this with respect to passwords is pretty insignificant. The real (and only) important question with respect to passwords is this:

What dictionary is your password in?

That is to say, if you were to construct a list of potential passwords to conduct a brute-force attack, how big would the dictionary have to be to contain your password?

For example:

  • Your password is in the top 500 most commonly-used passwords
  • Your password is in the dictionary of lowercase English words
  • Your password is in the list of lowercase or title-case English words with a one-digit or two-digit suffix
  • Your password is in the list of random-case English words with haxor numeric substitutions (i.e. A=>4, L=>1, S=>5)
  • Your password is in the list of all strings 8 characters or less using numbers and upper and lower case letters.

All of the above are examples of frequently used real-world password cracking dictionaries.

In other words

The purpose of password complexity is to stand up against a brute-force attack. The size of the smallest available dictionary that contains your password determines the amount of time required to crack your password. We can guess at what dictionaries will be available to the attacker, but we can't know for certain. Therefore, as a proxy for dictionary size, we instead use entropy. It's a poor substitute because it doesn't reflect the actual attack mechanics, but it's potentially better than nothing.

Comparisons of passwords based on entropy calculations may potentially be fruitful, but you should be careful to avoid ascribing too much value to a number which is, in the end, only indirectly related to how well the password will hold up.

tylerl
  • 82,665
  • 26
  • 149
  • 230
6

I suppose the simplest way to illustrate it is with an example.

Let's say we have a random number generator has a provable output entropy of 3 bits per digit of output. That generator's "toss" entropy is 3 bits. Now, let's say you run that for 20 digits, and despite the ridiculously small probability, every number in the stream comes out as 6. The "toss" entropy is still 3 bits per digit, so 60 bits. The actual "result" entropy of the password is tiny - one could argue that it's as low as 3 or 4 bits.

The difference is that the "toss" entropy represents the expected entropy of the output, based on probabilistic modelling of the generator, whereas the "result" entropy represents the actual information entropy of the data it produced in a real case.

Polynomial
  • 133,763
  • 43
  • 302
  • 380
  • Thanks Polynomial. Are there (formal) terms as to distinguish between the different kinds of entropy? Or does it solely depend on context? – mds Oct 14 '12 at 08:53
  • 1
    I'd say "statistical entropy" is a good term for the provable calculated entropy of the generator or a set sample. Shannon entropy is a particular type of entropy calculation that falls under this category. In terms of a formal term for the real entropy of a sample, I'm not sure I know of one. – Polynomial Oct 14 '12 at 09:00
  • 3
    The "real entropy" of a sample is known as Kolmogorov complexity: http://en.wikipedia.org/wiki/Kolmogorov_complexity – dhasenan May 31 '13 at 14:48
4

A single byte can contain up to 8 bits of entropy. This is the upper limit. As you learn more about your data, the amount of entropy in those 8-byte block goes down. Oh, all your bytes are all ASCII characters? That means the highest bit must be a 0; you're down to 7 bits of entropy. No control characters? Of the ASCII set, 0-31 are control characters - tab, enter, bell, end-of-file. That reduces the character further. alphabetic, lower case only? Now you're reducing the available options hugely. English word? Not many of those - an entire english word, RANDOMLY selected, may only have about, say, 12 bits altogether, even though the words may have 5 characters.

Passwords chosen by humans are even worse; not because the possibilities are smaller, but because some are chosen more frequently than others. If certain passwords are common, it's easier to guess; that effects the entropy. If 10% of your users have "password", that will reduce the entropy in your list of passwords - i.e. it is easier to guess.

So the more information you have about a password, the lower you can calculate the entropy. In the case of the Shannon formula, it assumes the biases in natural languages, and calculates the entropy at 3.6bits * 25 characers = about 90 bits. When you get the additional information (4 words, each from a list of 2048), that drops to 44 bits.

Look at it this way - if someone was hacking this password, knowing only that it was some natural language, and then suddenly finding out it was 4 words from a list of 2048 (and knowing the list), they would suddenly find their work a LOT easier.

AMADANON Inc.
  • 1,501
  • 9
  • 9
2

The thing you're missing here is the fact that an entropy measurement is tied to some specific probability distribution. You can't talk about the entropy of a password without making some assumption, explicit or implicit, about what probability distribution is the password randomly drawn from. Which bottoms out to the nature of the process whereby the password is generated.

In the XKCD comic, Munroe is telling you that he generated the passphrase by successively and independently picking four words at random from a dictionary of about 2^11 words. This tells you precisely the probability distribution that the password is being drawn from: the discrete uniform distribution over a set of 2^44 distinct passphrases.

But then when you do this:

B) The Shannon entropy of the actual string i.e. the entropy is calculated based on frequencies of the letters/symbols. Applying the Shannon formula on correcthorsebatterystaple the result is 3.36 bits of entropy per character.

...you're choosing a different probability distribution than what Munroe used, so you are going to get a different estimate, one that assigns about 84 bits of entropy to the string (25 × 3.36).


If you're evaluating the entropy of passwords for which you don't know how they were generated, one fruitful and intuitive perspective is to adopt the idea that the probability distribution in play is the attackers' knowledge and hypotheses about how users select passwords. If you can form yourself a reasonable idea of what this distribution looks like, then a password's entropy is its message length in an optimal code for that distribution. This is, very roughly, the approach in the better password strength meters like zxcvbn (although they formulate it in terms of average number of guesses to hit the password).

When you look at the 44- vs. 84-bit estimates that you show in your question from this perspective, what's going on is this: Munroe gets a much lower entropy estimate for the password because he assumes that the attacker has a 100% accurate hypothesis about how the passphrase was generated, so they can rule out huge numbers of strings a priori that are just not concatenations of four words in the dictionary. Whereas the Shannon entropy calculation that you illustrate gets a much higher entropy estimate because it's not as "smart" of a strategy for guessing XKCD-style passphrases. zxcvbn is cleverer, though, and it estimates a password cracker would crack correcthorsebatterystaple in about 10^14.43696 guesses which is about (14.4 × 3.3) + 1 ≈ 48.6 bits of entropy. (The calculation is a base-10 to base-2 logarithm conversion, plus one bit to convert from number of guesses to entropy.) That's a bit more than Munroe's estimate, but zxcvbn is coded to attack other passwords than just XCKD-style passphrases.

Luis Casillas
  • 10,361
  • 2
  • 28
  • 42
0

tl;dr: Your Shannon entropy calculation (B) isn't doing what you think it's doing. A & B give the same answer, you just aren't passing the right argument to B.

Full answer:

Shannon entropy calculates the minimum number of bits needed to represent (e.g., to send, or store) some information. In the case of your implementation B, it calculates the number of bits needed to store a single character, given a set of characters string. It does not calculate the entropy of string. (This hopefully clears up question #1)

So "correct" and "correctcorrectcorrectcorrect" are the same because your function is calculating how many bits it would take to represent a given character in the set [ 'c', 'o', 'r', 'e', 't' ] where the probability of each character occuring is derived from their occurences in the string "correct" (or "correctcorrectcorrectcorrect"), so 'c' has a 2/7 (or 8/28) chance of occuring, 'o' has a 1/7 (or 4/28) chance, etc. (This hopefully clears up questions #4)

To properly use your function, the argument string should be the probability space of all possible characters in your passwords, not just the specific password you are using. E.g., presuming your passwords allow alphanumeric characters [A-Za-z0-9], pass a string containing each of those values, enumerated out, to your function:

print( entropy('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0987654321') )
# => 5.954196310386873 

This is telling us that you need 6-bits to store a single item from that set of alphanumeric characters. If your "correcthorssebatterystaple" were a 25-character random string derived from the set of characters [A-Za-z0-9], it would have an entropy of ~148 bits.

But your password is specifically not a random 25-character string, it is a dictionary lookup with 2048 possible options and 4 selections.

So now that we are using the Shannon entropy function the correct way, let's provide it the correct argument for your specific situation. First split your function in two: one to calculate probabilities from a string and one to calculate the entropy given some probabilities:

import math
def prob(string):
        return [ float(string.count(c)) / len(string) for c in dict.fromkeys(list(string)) ]
def entropy(prob):
        return -sum([ p * math.log(p) / math.log(2.0) for p in prob ])
print( entropy(prob('correcthorsebatterystaple')) )
# => 3.363856189774724

Reminding ourselves that we pass entropy the probability of each item in our total set of possible items, in your case there are 2048 possible words in the entire set, and (presumably) each word has an equal 1/2048 chance of appearing. We can skip calculating probabilities from a string and pass these probability values directly to entropy:

print( entropy( [ 1.0/2048.0 for y in range(0,2048) ]) )
# => 11

11-bits. Same result as A. Your 4-word passphrase does indeed have a Shannon entropy of 44-bits.

schroeder
  • 125,553
  • 55
  • 289
  • 326
nexus
  • 101
  • 2