-1

Many online entropy calculators make certain assumptions (like assuming a password is as rare as the set of random characters that will generate it), or handwave away the math. I want to know how to calculate the entropy of a password distribution which is made up of one or more sub-rules which are drawn from a known distribution. Here are some examples to calculate.

1. Simple 8 character password

A-z0-9, random sequence

2. Passphrase chosen from hypothetical corpus of pop culture

  1. 5 word phrase chosen from corpus of 10,000,000 phrases
  2. First letter may be capitalized

3. Diceware-style with additional "strength" rules

  1. 4 random words drawn from list of 7776 words (6^5, 6-sided die, 5 rolls)
  2. The first or last word in the sequence may start with a capital letter, but not both (one capital letter in passphrase)
  3. The words are joined with space, -, or no spacing character at all (ex. correcthorse) - the same spacer is used for each word
  4. The sequence starts or ends with 1 of the 10 symbols on a US keyboard number row
  • We already have questions for each of these. – schroeder Jan 19 '23 at 08:09
  • And ultimately, this is more of a math question than a security question if you just want to calculate entropy in for certain constraints. – schroeder Jan 19 '23 at 08:10
  • @schroeder I am in awe that this SE can have questions for calculating the entropy of an unknown password distribution. Even before someone thinks of them! Wow! And I'd love to see existing answers for #2 and #3, which I made up on the spot. This question is about explaining the math of entropy in an accessible manner, *in service of information security*. This is not a theoretical problem but one with concrete applications in security, as is evidenced by how common "entropy" and password strength is discussed. They are referenced often, yet I could fine no accessible resource for calculation. – Jimmy Carter Jan 19 '23 at 18:44
  • Alleged "duplicates": (1) xkcd, "Short complex password, or long dictionary passphrase?". Does not ask about calculating the entropy of arbitrary rules. Asks if comic is "accurate" in comparing two password generation techniques. Answers handwave the math, require people to understand logarithm manipulation. (2): Does not ask to walk through calculation of entropy. Asks for "alternate equations" which opens the door to Bonneau's paper. Does not ask for probabilistic approach. (3) is perl-heavy and more of an implementation question. Asserts an "approximate" calculation. – Jimmy Carter Jan 19 '23 at 18:53
  • Right, so, as I said, your whole goal here is to provide math, not security. The duplicates explain the concepts and the general math. You provide arbitrary schemes and calculate the entropy for these specific schemes. Making this hyper-relevant only to these schemes. Therefore: math problems. The duplicates provide the foundation for which you can calculate any of these schemes. – schroeder Jan 19 '23 at 19:37

1 Answers1

-1

The general process is to determine the combinations of each sub-rule, and how the sub-rules interact. If they are statistically independent (ex. random dice rolls), then they are multiplied together in the entropy calculation.

Using the notation here

  • L = Password Length; Number of symbols in the password
  • S = Size of the pool of unique possible symbols (character set)
  • Number of Possible Combinations = S^L
  • Entropy = log2(Number of Possible Combinations)

1. Simple 8 character password

L = 8

A-z0-9, random sequence

A-Z = 1/26 a-z = 1/26 0-9 = 1/10

Each char is (26 + 26 + 10) = 62 = S

S^L = 62^8

Entropy = log2(62^8) = 47 bits

2. Passphrase chosen from hypothetical corpus of pop culture

  1. 5 word phrase chosen from corpus of 10,000,000 phrases
  2. First letter may be capitalized

The phrase length is irrelevant, what matters is the combinations.

For the phrase part:

  • L = 1
  • S = 10,000,000

The caps rule doubles the number of combinations.

Entropy = log2(10,000,000 * 2) = 24 bits

3. Diceware-style with additional "strength" rules

  1. 4 random words drawn from list of 7776 words (6^5, 6-sided die, 5 rolls)
  2. The first or last word in the sequence may start with a capital letter, but not both (0-1 capital letters in passphrase)
  3. The words are joined with space, -, or no spacing character at all (ex. correcthorse)
  4. The sequence starts or ends with 1 of the 10 symbols on a US keyboard number row

  1. 3 words are S=7776=6^5
  2. 1 word is S = (7776 * 2), for caps and lower. And it can be the first or the last word, doubling the options.
  3. The joining multiplies the possibilities by 3 ( -NA, 3 options)
  4. The special characters multiply the possibilities by 10

S = (6^5)^3 * 2 * 2 * 6^5 * 3 * 10

Entropy = log2(S) = 58 bits

Note: The additional rules only multiplied S by 120 (2 * 2 * 3 * 10) vs. a simple 4 word diceware phrase.. This added ~7 bits of entropy. A simple 5 word phrase is more entropic than the rules above, multiplying S by 7776.

Further Reading

  1. Calculating password entropy? and Statistical metrics for individual password strength
  2. What is the best way to calculate true password entropy for human created passwords?
  3. How genuine are password entropy calculations?
  4. https://diceware.rempe.us/#eff
  5. XKCD #936: Short complex password, or long dictionary passphrase?
  6. https://explainxkcd.com/wiki/index.php/936:_Password_Strength
  7. https://theworld.com/~reinhold/dicewarefaq.html#calculatingentropy
  • I hope I got the math right. This was prompted by the lastpass breach and trying to put some bounds on what kinds of passphrases were at risk. I was surprised to not find an easy breakdown of the calculations anywhere online, especially if you have a homebrew scheme. – Jimmy Carter Jan 19 '23 at 07:39