7

I apologise for perhaps the confusing title, I'll try and elaborate a little better.

Many discussions I see surrounding password entropy focus on the specific context of the range of choices available for that nominated data. In isolation this is fair enough, but it seems to ignore the possibility of choosing that range of data in the first place. Perhaps an example would better illustrate.

In another very recent question, a bible verse was considered to have drawbacks for a password, as there are (apparently) ~33,000 bible verses. This leads to an entropy of 16 bits from my understanding. When looking at say a case insensitive latin alphabet password, it has 4.7 bits of entropy per symbol, so if you compared entropies a 4 digit upper / lower case password (4.7 * 4 = 18.8 bit) password should be 'harder' to guess than a randomly chosen bible verse.

Wouldn't it be fair to say though that a password is much more likely to under go a brute force attack as opposed to be checked as valid bible verses? Is not the 4 digit password significantly weaker than a bible verse? Can we ever judge the entropy of a password ignoring the liklihood of the data range selection? I'm just curious as to how valid it is to say password XYZ is bad because there isn't much variety in a datasets, given that it's extremely unlikely that an attacker will be able to narrow down the dataset used for password choice so specifically.

Sorry security newbie, but it just struck me as odd that someone may use entropy to justify why something such as a long bible verse may be a bad password.

Rory Alsop
  • 61,474
  • 12
  • 117
  • 321
Peleus
  • 3,827
  • 2
  • 19
  • 20
  • You could make the same argument comparing a random 4-digit pw to the name of a US state... Brute forcing the digits vs having the thought of "hey, lets try state names!"... Just trying to simplify the position a bit – Brian Adkins Feb 20 '13 at 04:19
  • You can't really talk about the entropy of a single password. Entropy is a property of a password generation process. When we talk about the entropy of a single password we implicitly assume a process that fits the way real people choose their passwords. That process can only be approximated. – CodesInChaos Feb 20 '13 at 11:22
  • CrDj”(;Va.*NdlnzB9M?@K2)#>deB7mN – SDsolar Jun 02 '17 at 02:43
  • Bible verse references are [fairly common as passwords](https://boingboing.net/2017/01/07/bible-references-make-very-wea.html) so it's reasonable to assume that dictionary attacks do include them. – Tgr Dec 03 '17 at 01:27

1 Answers1

10

"Entropy" for a password is (roughly) a description of how many different passwords you could have obtained. In the case of the "Bible verse" password, the password generation process is a choice among about 33000 possible verses, which means about 15 bits of entropy (because 215 = 32768). Entropy qualifies the process, not the result.

The usual security stance is to assume that the attacker is smart. In particular, he more or less knows what password generation process we used. This makes a lot of sense in corporations and big organizations, where the security administrators publish guidance to users about how to choose a password: it seems fair to assume that most users will follow these rules, especially if the security admins implemented automatic tools to verify (to some extent) that the password follows these rules. For "personal" passwords, attackers may assume that the user is following one of the few "password generation methods" which are published throughout the Internet. Indeed, on this very site, we are discussing password generation methods, and we do it in plain view.

Any reasoning along the lines of "the attacker will surely try random passwords first" or "the attacker will never think of using a sentence from book X" is inherently flawed in the following way: it assumes that the attacker is just a random anonymous hacker who cracks passwords for the fun of it, without knowing who he is attacking. It is a tame attacker who is allowed to try passwords but not to try very hard. It denies the existence of smart attackers who are after you, specifically. Incompetence is the most widespread human trait and many attackers, being people, are afflicted with it; but experience shows that smart attackers exist nonetheless.

The psychological mechanisms behind this are reasonably clear: we want to consider ourselves as "smart". It is a humiliating experience to realize that, possibly, attackers are not only evil and naughty, but also more intelligent than us, and that we will not necessarily be able to beat them in a "battle of wits". This might be the most important insight that apprentice cryptographers must obtain at one point of their learning: we do not defeat the attackers by being smarter than them; we defeat them by throwing mathematics at them. In the case of passwords, "mathematics" are "quantifiable randomness".

Quantifiable means that you can measure how random your password is. "Eight random letters" is highly quantifiable: that's 268 = 208827064576, i.e. about 37.6 bits of entropy. "The attacker will never think of that": how much does this "never" represents ?

To a large extent, this is the same problem as security through obscurity. Which is not recommended for pretty much the same reasons.

Thomas Pornin
  • 322,884
  • 58
  • 787
  • 955
  • Excellent response and I think it addresses what I was getting at. To rephrase my question with my new understanding it was along the lines of "Why do discussions surrounding password entropy seemingly ignore the likelihood of the password generation process being selected?" My answer would now probably be we can only evaluate the knowns in the equation, and that's the password generation processes itself. The selection of that generation process adds its own entropy but as with most security we just base it off the assumption they know and evaluate the strength of the remaining section. – Peleus Feb 21 '13 at 01:32