I am contributing to the Word Sequencer plugin for KeePass password manager, which can generate diceware-style passwords using a high-quality PRNG. Something in particular I'm working on is estimating the strength of passwords generated using the tool. I'm having a little trouble figuring out how to account for one of the configuration options, which can set one of the words in the sequence to have a probability of appearing in the generated password or not; i.e. an option to make the password a randomized length.
For the sake of example, suppose you're choosing 2 words from a wordlist of 8 words for your password (obviously you'd actually want a much larger wordlist/number of words, this is just a toy example). If you always choose both words, then the entropy of the password is:
lg(8*8) = lg(64) = 6
or alternatively:
lg(8) + lg(8) = 3+3 = 6
Now, say that you've configured the second word to not appear sometimes. Thus you have a chance of a one-word password (8 possible) or a two-word password (64 possible) for a total of:
lg(8 + 8*8) = lg(9 * 8) = lg(9) + lg(8)
...which should be a tiny bit more than the previous entropy of 6. This should be the entropy if an attacker was guessing JUST THIS ONE PASSWORD and he or she knows exactly how the password was generated.
But it doesn't actually matter if the password COULD have been 2 words long. If it's only 1 word long, and the attacker just guesses all 1-word passwords, the possibility of a second word doesn't really make the one-word password any stronger. So assuming there is a 25% chance of including the second word, maybe a better strength estimate would be the entropy of the expected value of the password space:
lg(8) + lg(3/4 * 0 + 1/4 * 8) = 3 + 1 = 4
Or, maybe it would be the expected value of the entropy itself:
lg(8) + [3/4 * 0 + 1/4 * lg(8)] = 3.75
So my question is: which method of calculating the expected entropy of this generated password is correct?
- Should I treat the random length as adding additional possible passwords, thus slightly increasing the strength of a 2-word password?
- Should I treat the random length as possibly decreasing the length of the password, so I take the entropy of the average number of password choices, for a strength somewhere between a 1-word and 2-word password?
- Or should I take the expected value of the entropy, again for an in-between strength?
- Maybe it's something else entirely?
If in fact the "correct" calculation would decrease the password strength when the option is enabled, then I guess that begs a follow-up question: is there any useful reason to even have this option, if it's just going to reduce the work an attacker must do on average?