3

XKCD #936 uses a limited subset of the English language, only 2000 words. I just looked it up, and the English language has over a million words, a sizeable subset of those having special characters, numbers and punctuation.

1,000,000 is about 2^20, so 4 random words from that set would have 80 bits of entropy, maybe a bit less. If we want, we can even remove part of the dictionary so we don't have to deal with monstrous words like antidisestablishmentarianism which would break many password inputs.

Would this be a reasonable way of improving on the entropy? Or am I missing a vital point?

SilverlightFox
  • 33,698
  • 6
  • 69
  • 185
Nzall
  • 7,373
  • 6
  • 30
  • 45

3 Answers3

5

Yes, you are correct, and yes you are missing something.

Sure, you could easily increase entropy manifold by using a larger word list; you could also achieve that by using 8-word passphrases, or just using raw entropy bytes directly without the words.

The entire point of that xkcd is balance.
Balance between "enough entropy" and "easy enough to remember".

You could argue if 44 bits is enough entropy, or if you need more. But if so you must take into account the non-negligible cost of reduced memorability. It is always a tradeoff.

As I stated in my answer to the canonical XKCD 936 question:

AviD's Rule of Usability:
Security at the expense of usability comes at the expense of security.

So yes, go ahead and use the full language as your dictionary - but you are paying a price, which many would consider to be a bad tradeoff.

As Randall (xkcd's author) explains here (and in agreement with many studies on the subject), he was basing it not just on all possible words in a dictionary, but words that are EASY for a typical person to remember (and type, I will add).

Another option, more aggressive than xkcd but not ridiculously difficult as the full language, is something like Diceware's dictionary - larger than 11 bits per word, but not much more (just under 13 bits).
So 4 words of that would be ~51.5 bits. Or, take another simple word and get almost 65 bits entropy.

Yes, that improves it a bit, without costing much usability, since they still stick to short, common words. (Personally there are still a few "filler" words, like numbers, that I would prefer to do without).

As always, it is about balance.

AviD
  • 72,708
  • 22
  • 137
  • 218
3

You'll "break" many password inputs anyways, since the password scheme won't include numbers or special characters that many sites require, and will likely already be longer than the maximum if the site has a maximum password length allowed.

The real solution isn't to use some trick to memorize a highly entropic password (although if that's what you really want to do, Diceware is the way to go). The real solution is to use password management software that'll remember truly random, unique passwords for each different site you visit. If you're reusing passwords across sites, high entropy isn't going to do you any good when one of those sites is storing your password in plaintext and they get compromised: your accounts on other sites will get compromised too, right alongside everyone who uses letmein for all their accounts.

Aron Foster
  • 1,204
  • 2
  • 11
  • 19
0

The reason word lists are important is they dramatically lower the amount of entropy. You could simply increase entropy by utilising some other language. That way the crackers would require to use naive letter frequencies for their guesses.

munchkin
  • 393
  • 1
  • 5