25

Whenever I look at password entropy, the only equation I ever see is E = log2(RL) = log2(R) * L, where E is password entropy, R is the range of available characters, and L is the password length.

I was wondering if there are any alternate equations for calculating entropy, which factor weak passwords into the equation. For instance, passwords with sequential characters (0123456789), common phrases (logmein), repeating words (happyhappy) or words with numbers appended (password1) would all receive a lower entropy grade due to their various shortcomings.

Does such an equation exist? If so, is it commonly used in the security field, or do people tend to stick with the "standard equation"?

Shurmajee
  • 7,335
  • 5
  • 28
  • 59
Moses
  • 2,157
  • 2
  • 20
  • 23

3 Answers3

27

There are equations for when the password is chosen randomly and uniformly from a given set; namely, if the set has size N then the entropy is N (to express it in bits, take the base-2 logarithm of N).

For instance, if the password is a sequence of exactly 8 lowercase letters, such that all sequences of 8 lowercase characters could have been chosen and no sequence was to be chosen with higher probability than any other, then entropy is N = 268 = 208827064576, i.e. about 37.6 bits (because this value is close to 237.6).

Such a nice formula works only as long as uniform randomness occurs, and, let's face it, uniform randomness cannot occur in the average human brain. For human-chosen passwords, we can only do estimates based on surveys (have a look at that for some pointers).

What must be remembered is that entropy qualifies the password generation process, not the password itself. By definition, "password meter" applications and Web sites do not see the process, only the result, and uniformly return poor results (e.g. they will tell you that "BillClinton" is a good password). When the process is an in-brain one, anything goes.

(I generate my passwords with a computer, not with my head, and I encourage people to do the same.)

Thomas Pornin
  • 322,884
  • 58
  • 787
  • 955
  • I quite agree on using a generator to give uniformly random passwords, though a phrase of 40+ characters can be managed in the brain and tends to beat most attacks ;) – ewanm89 Oct 03 '12 at 17:21
  • 1
    100% agree that the *only* secure password is the one that doesn't involve a human. Keepass, Lastpass, 1Password, etc. are the way to go. – Polynomial Oct 03 '12 at 19:03
  • 1
    +1 for pointing out that entropy can only be measured in context of the password generation process, and cannot be determined by only examining the outcome. – Stephen Touset Oct 03 '12 at 19:52
  • This is an even better way to generate passwords then computers, dice. – PyRulez Nov 07 '14 at 01:38
  • So if I'm correct, correcthorsebattlestaple can be calculated as 4 words chosen totaly randomly in a 2048 words dictionary, which leads to 44bits of entropy(Log2(2048)*4). Or as 24 caracters chosen totaly randomly in a 26 letters alphabet which leads to 112bits of entropy (Log2(26)*24) ? I'm not opening a new question I'm just browsing existing. – Guillaume Beauvois Dec 04 '17 at 13:46
  • 1
    "Entropy qualifies the password generation process". So the entropy is 44 bits if you generated the password as four random words in a list of 2048. If you generated as 24 random letters, then the entropy is about 112.8 bits. – Thomas Pornin Dec 04 '17 at 16:33
12

Joseph Bonneau from the University of Cambridge has done extensive research in the area of user chosen passwords. In a recent paper (PDF) Bonneau proposed using "statistical metrics for individual password strength". In this paper he describes

several possible metrics for measuring the strength of an individual password or any other secret drawn from a known, skewed distribution. In contrast to previous ad hoc approaches which rely on textual properties of passwords, we consider the problem without any knowledge of password structure. This enables rating the strength of a password given a large sample distribution without assuming anything about password semantics

When we talk about the entropy of a password, we're really interested in how hard it is to guess it. Bonneau's paper describes how this can be measured based on statistical information of actual passwords.

David Wachtfogel
  • 5,522
  • 21
  • 35
7

From a purely combinatorial mathematical point of view 0123456789 is no more less weak than any other 10 character string. Such equations that you are referring to are based in combinatorial math.

However from a statistical point of view it's weaker because people commonly use it as it is easier to remember, therefore attackers building common password dictionaries include those ones first and as a such it's likely to be one of the first passwords the attacker cracks as it's near the beginning of the list for him to try. You could create some slightly more complex equations or just say, well, as you only used numerical digits even though I allow more I'm going to calculate the strength using just numerical digits as the character set. This will help estimate the statistical issues but will not perfectly match the real situation.

You could also actually check against a password cracking dictionary and check if word is in there and how close the most similar word is, but this then only gives a strength based on that particular dictionary and another attacker would use a different dictionary.

ewanm89
  • 2,043
  • 12
  • 15