14

I'm looking for analyses of how users choose and use passwords. I'm sure there are many resources out there that analyze user passwords. For instance, I've seen many people analyze dumps of password databases from sites that got hacked. I'd like to learn more about what's known on this topic, and find resources where I can learn more.

I'm particularly interested in data on questions such as:

  • How much entropy do users' passwords have?
  • How many users choose a guessable password?
  • What are the most common passwords?
  • What does the distribution of password entropy or password length look like?

Let's collect as many of these resources and analyses as we can find. The more, the merrier!


Also feel free to post resources that analyze other related questions about user behavior surrounding passwords, e.g.:

  • How many users use the same passwords on multiple sites?
  • How many different passwords does a single user typically have?
  • How do users manage their passwords?
  • How many users write down their passwords somewhere?
  • Do users pick passwords more securely for critical tasks, like online banking?
  • How many users are fooled by phishing sites or can be tricked into revealing their passwords?

And so on.

D.W.
  • 98,860
  • 33
  • 271
  • 588
  • 1
    I'd start with http://security.stackexchange.com/questions/6175/are-there-lists-of-most-common-words-or-ngrams-used-in-passwords-and-passphrases. Maybe others will come up with more. – Jeff Ferland Aug 29 '11 at 17:53
  • 1
    As to entropy you might be instersted in http://crypto.stackexchange.com/questions/374/how-should-i-calculate-the-entropy-of-a-password – this.josh Aug 30 '11 at 07:43
  • Just collect plaintext passwords and get the stats yourself! (but please, don't do that) – Jojodmo May 13 '16 at 04:59

5 Answers5

15

The big web password leaks (particularly RockYou, where the leak was of plaintext passwords, so even the strongest passwords are visible) have been analysed several times. See e.g. Imperva and Troy Hunt -- or just get hold of some of the password lists and do your own analysis to calculate entropy etc.

Troy and a group from Cambridge both found significant password re-use across sites of around 70%.

An interesting CISCO, RedJack and Florida State paper gives some stats on the charset mix/entropy of leaked passwords -- longer passwords tend to also have a greater charset mix -- and the effect of password policies such as "must contain a digit" on password strength. This analysis shows that most (>70%) users faced with a "must contain a digit" policy will use a simple numeric pre/suffix, with many of the remainder using l33t-speak substitutions (neither of which provides much protection from JtR-type tools); simimlarly, 30% of passwords containing a "special character" have just one, at the end. The paper also shows that the NIST "entropy model" is a poor indicator of password crackability in the wild, because it fails to account for the use of common words as the basis for the vast majority of passwords.

That paper references another, which showed what we all know -- that password expiration policies result in users making small incremental changes to generate a new password each expiry -- and that this knowledge could be used by attackers to break "new" passwords given a previous one much faster than brute-force or dictionary attacks would allow. That paper tentatively recommends non-expiring passwords with much stricter length/complexity requirements (e.g. a dicewords-style passphrase).

In their OWASP 2011 presentation KoreLogic showed a slide with the "proportion cracked" for various (hashed/encrypted) password leaks, which suggests that less than 10%, and probably <2%, of users have passwords that are complex enough and long enough to resist a combination of dictionary, rainbow and brute-force attacks. We can also infer that brute-force attacks do noticeably worse than rainbow attacks -- the two examples on that slide that include salt have significantly lower proportions cracked than the plain MD5/no-salt cases.

Re: Do people use more secure passwords for their banking etc. accounts:

The KoreLogic analysis indicates that "corporate" passwords are rather more complex than typical "web passwords". This difference appears to be due to typical corporate password policies (e.g. mandating minimum length and charset usage) which both makes the typical password more "complex" but also leads to some commonly repeated password derivation patterns. I don't think we can assume that passwords on banking/financial sites will be any more complex in the absence of corporate-style policy enforcement.

The "blanked out" entry on the Hash EXchange screenshot in the KoreLogic presentation presumably relates to the "unnamed financial site". That might not be a bank, but the 70% cracked proportion gives us an indication that, while users might be using somewhat stronger passwords there (compared with gawker etc.), a large majority still use weak passwords.

Misha
  • 2,739
  • 2
  • 20
  • 17
  • Actually the group from Cambridge made an estimate of 31% password re-use and Hunt found 67%. The Cambridge group goes on to filter out some of the passwords and accounts for, what I would say, is a too "positive" estimate. I think they were more on track with their 70% estimate, considering Hunt's work. – Bono Feb 25 '16 at 16:45
7

This blog post from Troy Hunt gives an interesting analysis based on data from the Sony, Gawker, and other breaches.

jrdioko
  • 13,071
  • 7
  • 30
  • 38
5

One of the sites that seems useful is PasswordResearch.com - they have analysis sorted into:

  • User Password Practices
  • Authentication Policies, Practices, or Procedures
  • Password Lifetime Policies or Practices
  • Password Length Policies or Practices
  • Password Character Usage Policies or Practices
  • Authentication Related Criminal Incidents
  • Opinions on Authentication
  • Market Use of Authentication Technologies
  • Costs Associated with Authentication
  • Authentication Business Impacts
Rory Alsop
  • 61,474
  • 12
  • 117
  • 321
1

Kind of late to the party, but the tangled web of password reuse (2014) by Anupam Das, Joseph Bonneau, Matthew Caesar, Nikita Borisov and Xiao Feng Wang has an in-depth look into password re-use.

They study eleven websites with several hundred thousands of leaked passwords. The estimate they make is about 50% of re-use among multiple sites.

Bono
  • 165
  • 8
1

Here are some more statistics on password use and password re-use:

  • Too Many People Reuse Logins, Study Finds (PCWorld, Feb 2010) reports that 73% of users re-use their online banking password on at least one other website. 50% re-use both their online banking username and password on at least one other site.

  • Sophos (Mar 2009) reports that 33% of users admit to using the same password for every website they use. 48% say they use a few different passwords.

  • A Large-Scale Study of Web Password Habits (Florencio and Herley, May 2007) studies half a million users and measures their password use, such as the number of passwords and accounts they have, how often they share passwords among sites, and how strong their passwords are. For instance, they report that the typical user has an average of 6.5 passwords, and each password is shared (on average) across 3.9 different sites. They claim that the average strength of a user password (in bits) is 40 bits. They estimate that at least 1.5% of Yahoo users forget their passwords each month.

D.W.
  • 98,860
  • 33
  • 271
  • 588