36

A few months ago, kutschkem answered a question about HIBP with this:

Let's say every person on earth has used ~1000 passwords so far. That makes approximately 10 trillion passwords, which is ~243 if I am not mistaken. Choosing any existing password at random is thus about as good as a truly random 8-9 character case-sensitive character password. Not very good. See this answer.

That basically means that, in theory, not only should one not reuse a password, one should not reuse a password that has been used by anyone ever. Passwords that have been used before are basically one big dictionary attack waiting to happen.

I was reminded again by the discussion between Mike Ounsworth and Conor Mancone in the comments on this answer about blacklisting the top N passwords versus the entire HIBP database.

Is kutschkem's math right? Does this make the conclusion right?

Michael
  • 2,432
  • 2
  • 20
  • 37
  • 1
    If this was a poll, i would vote “Yes”. If you use a password that has been used by someone else before, you are more likely (i think) to use a password that has been used frequently before than once before. Those passwords are less secure, but how great an impact it has depends on the system the password is used in. Compare Facebook to SSH login to a hacked Windows machine where the NTLM hashes have been dumped. – Birb Nov 15 '19 at 22:16
  • 28
    By knowing that someone else has used the password already, it must have been published on a list of known passwords, and those are more likely to be broken. – Bergi Nov 16 '19 at 11:37
  • The conclusion is definetely right. The Akamai State of the Internet states that most of the attacks nowadays are Credential Stuffing attacks so checking your password against a database like haveibeenpwned.com makes a lot of sense and is recommended by security professionals – Sir Muffington Nov 16 '19 at 13:46
  • 8
    1000 passwords on average for every person on earth? Seems quite unrealistic. Most people use like 5 passwords in their whole life (unfortunately). For every person that uses a password manager (and probably uses tens of thousands of passwords in their lifetime..) there are like 100 that use a number closer to 10...I'd guesstimate the 2^(43) guesstimate is like 5-6 orders of magnitude (base 2) wrong at least... – Bakuriu Nov 16 '19 at 21:33
  • @Bakuriu also, there must many collisions, with many people choosing "password", "1234" or the more secure "1234567" – Eric Duminil Nov 16 '19 at 22:29
  • I would also note that the conclusion says it is as good as 8-9 characters, while the general new standard is 12 and the last one I heard was at least 20 random characters. – awsirkis Nov 17 '19 at 03:05
  • 6
    @Bakuriu I think that's the point, it's an optimistically high number. The conclusion becomes stronger if the number of used passwords is actually less. – Kat Nov 17 '19 at 03:23
  • 23
    I use "correct horse staple battery". That's so random I'm prettty sure nobody uses that. – Harper - Reinstate Monica Nov 17 '19 at 11:09
  • 5
    Counterpoint: How do you know that nobody has ever used your password before? – Justin Time - Reinstate Monica Nov 17 '19 at 18:15
  • Yes it's wrong to do so. There are plenty of passwords to choose from and there's no excuse for not choosing randomly. – Awn Nov 18 '19 at 12:21
  • I may be stating the obvious here, but this entire discussion hinges completely on knowing your threat. Are attacks commonly composed of X? Then don't use X. We can all sit here and try to imply that X or Y or Z are good or not good, but unless those implications are backed up with some degree of relationship to actual threats (do people actually do attack X?), it's all just daydreaming. – dwizum Nov 18 '19 at 16:18

8 Answers8

32

The math may be right. One could refine and complicate it as much as desired, but it doesn't really add to the point. So I'll leave it be.

Also, in practice it is easier—and might be faster—to check for any random character password with a fixed length than to check unique passwords from a list. A password list with 243 passwords with an average password length of 8 characters would be about 64 TB in size, if my calculation is correct. This would have to be stored somewhere in close proximity to the processor to be read with the same speed as the processor calculates the hashes.

The conclusion however, is not right: The important question is not if a password has ever been used, but if the password has ever been included in a breach.

If the breached passwords were thereafter publicly disclosed, they are now available on the internet. The passwords are now not just any passwords that have been used, but a very small subset of them. And to make things worse, this subset is used in wordlists by a lot of people around the world to check if they have been reused. So the chance that someone checks a hash against this password is a lot higher than the chance of him or her checking a hash against an unknown password, even if it has been used somewhere.

So I would not use a password that is included in the HIBP database, simply for the reason that those passwords have a higher chance to be included in wordlists.

Michael
  • 2,432
  • 2
  • 20
  • 37
Martin Weil
  • 502
  • 3
  • 10
  • 1
    I don't understand your answer. You say the conclusion is not right... and then conclude the same thing? The answer is still: "yes, it is a mistake to re-use passwords", or do I misunderstand you? – Luc Nov 16 '19 at 22:48
  • 15
    @Luc It's a mistake to reuse passwords, but not the same set. The original question included all passwords ever used. This answer only includes passwords in any breach - that's many, many, many fewer passwords. – daboross Nov 17 '19 at 00:47
  • 1
    But any password that is known to be in use can (by definition) be guessed, so if you know that a password has been used by anyone ever, you should not reuse it. Doesn't matter if it was from a breach, because someone told you their password, because it's a well-known phrase or wordplay, or any other reason. I don't see the distinction between "oh I should not reuse this password from list" or "I should not reuse this password because someone else already used it". Don't reuse passwords, period. – Luc Nov 17 '19 at 13:47
  • 3
    `The conclusion however, is not right. The question is not if a password has ever been used, but if the password has ever been included in a breach.` I mean sure, you can't _actually_ know "all passwords that have ever been used", so "included in a breach" is really the only list that exists. On the other hand, the (theoretical) argument is still somewhat valid because, given you had such a list, you could never know if a password will be included in a breach _in the future_. If reusing your own passwords is bad, then so is reusing other peoples passwords. – kutschkem Nov 18 '19 at 07:25
  • 1
    @Luc True, every known password can be guessed, but also, if the password is for example between 8-20 characters long and consitsts of a known set of characters, every password is known. The point is, if you know that a password has been used, it is breached. Even if your friend told you about a password that he is using i would consider it breached. The only exception is of course passwords that you yourself use: Even if they have not been included in a breach, don't reuse them. – Martin Weil Nov 18 '19 at 07:35
  • 4
    Maybe it would be better to say: Don't use passwords of that you know are, or have been, in use either by yourself or anyone else. – Martin Weil Nov 18 '19 at 07:35
  • 1
    One could make the point that we don't know of all the breached passwords that are not included in HIBP (which is by some estimated to be ten times the number), but that again is moot because you wouldn't be able to check if a password is from that pool anyways. – PlasmaHH Nov 18 '19 at 12:04
  • One could use a *bloom filter* here, requiring much less space to check if a password is in the password list. https://en.wikipedia.org/wiki/Bloom_filter – lynn Nov 19 '19 at 12:20
23

Mike Ounsworth here (author of the thread you're referencing)

This is a great excuse to do some back-of-the-envelope math! The factor to think about here is that when you're getting to numbers like 243, you have to start factoring in the number of hard drives, CPUs, and electricity required to store and use that data.

To make math easy, let's say each of those 243 password is stored as a SHA-1 hash (as is the case with the HIBP database). Each SHA-1 value is 160 bits, or 20 bytes. 243 * 20 bytes = 176 terabytes. Larger than my laptop, but chump change for a cloud service.

Going the other direction, imagine you have a database of all 243 plaintext passwords. You get your hands on the hash of an admin's password and you want to brute-force it against your database. Let's take the simplest and most insecure case; it's an unsalted SHA-256 hash. This is the problem that bitcoin mining rigs were built for baby! Let's take this bitcoin miner as a rough benchmark: $3,000 USD, 50TH/s (tera-hash per second), and consumes 1975 W.

According to my hasty math, one of those units would take 2^43 / (50,000,000,000,000 / s) = 0.2s to try all passwords, assuming that a database can feed 176 TB of data to it that quickly.

In reality, passwords are (well, should be) stored with salted PBKDF2 or Argon2. This changes the game considerably as these hash functions are intended to prevent this kind of attack. These hash functions can be tuned as slow as you want at the time that you store the password as a hash. Say you tune it to be ~ 0.1 s per hash. Now suddenly you're looking at numbers like "thousands of centuries", and "power consumption of the planet".


TL;DR: this is a great question to ask!

The answer is that if you're going to choose a password that you can remember and might collide with someone else on the internet, then your choice of password is less important than the site you're giving it to storing it securely.

IMHO, in choosing a password, you're not trying to prevent a dedicated enough attacker from ever cracking it; instead you're trying to make it hard enough that they'll go after a softer target. "I don't need to outrun a bear, I just need to outrun my friend".

If course, if you use a password manager with a completely random 32-char password, then you're getting into the cryptographic strength realm of "age of the universe", and "power output of a large star". So do that!

Mike Ounsworth
  • 58,107
  • 21
  • 154
  • 209
  • 14
    An internal system I developed actually starts returning "heat death of the universe" against 'length of time to hack' because it's really educational to see passwords for from '2 minutes' to '6 years' to 'basically forever' as you make it more complex. It checks against dictionaries/HIBP and insults you terribly if you try to reuse a password. So far it's prevented the wider company using over 100 terrible passwords. – Cyclical Nov 17 '19 at 21:16
  • 5
    @Cyclical Are you able to share this, by chance? That sounds pretty cool! – Coldblackice Nov 18 '19 at 12:43
  • 2
    @Coldblackice it's one of those projects that needs decoupling/refactoring before it's truly releasable, but I have a plan to do this at some point - possibly as a separate API and UI, since there is a nice UI with areas for non-techies (dumbed down explanations of why your password is bad, charts of how hackable it is), techies (hash/encode your dumb password), and some very flexible endpoints for generating secure passwords and critiquing any entered choice of password in a very thorough and insulting manner. – Cyclical Nov 20 '19 at 22:50
  • 1
    @Coldblackice https://github.com/dropbox/zxcvbn might be similar enough to be similarly interesting. Doesn't insult users who hit the blocklist though. – Iiridayn Sep 10 '20 at 23:25
  • @Iiridayn Indeed, very cool, thanks for sharing! – Coldblackice Sep 24 '20 at 06:42
11

I see some logical errors with that statement - first of all, how would you ever know it?
If Joe Schmoe used a specific password in 2007 - 2009 for his Windows PC, and it was never hacked, and the machine is trashed and burned, there would be no record of it anywhere.
Therefore, unless a password was hacked or published in any other way, you cannot know, and so cannot avoid reusing it.

Aside from that, of the estimated 2^43 passwords ever used, probably 2^42.9 are duplicates, and the list fits on one hard disk.

Aganju
  • 361
  • 2
  • 7
  • This is the only answer I see as remotely correct. Nobody else seems to have picked up on the fact that the model itself is just wrong, and people seem to have gotten lost in "math land" (or more correctly, arithmetic land, and forgotten that mathematics is about modeling, and forgotten to question the model itself. – Steve Sether Jan 31 '20 at 17:21
4

A mixed-case alphanumeric password for lengths between 1 and 9 (inclusive) has a key space of 13,759,005,997,841,642, which is between 253 and 254.

The math is a decent ballpark guess, but not a reasonable back-of-the-napkin guess.

However, just because the math is wrong does not mean that conclusion is invalid.

Humans are bad at passwords. We memorize them, reuse them, and generate them from easy to remember words.

So, a naive brute force of passwords will generate a lot of possibilities that people have never memorized, have never reused, and aren't similar to words in a human language.

Using a dictionary of previously leaked passwords is probably the fastest way to guess passwords, because you know that someone, somewhere has used that password before. Humans being human, it's more probable that this password will be used again than it's probable that any random value matches a password.

Because of this, my firm opinion is that it's a mistake to use a password that wasn't randomly generated, but I'll agree with the sentiment that it's a mistake to use a password that anyone has used before regardless of how it was generated.

Ghedipunk
  • 5,935
  • 2
  • 23
  • 34
1

Is kutschkem's math right?

What kutschkem seems to be saying is:

  1. If about 7⋅109 people chose 1000 passwords each, there would be about 243 passwords in use.

    This seems like a reasonable approximation: log2(1000⋅7⋅109) ≈ 42.7; round it up to 43. (I am not assessing the empirical question of how many passwords people have chosen—only verifying the multiplication!)

  2. There are about 243 8-character passwords.

    This is a slightly low estimate: If we count only US-ASCII alphabetic passwords, with case distinctions (‘truly random 8-9 character case-sensitive character password’), there are 2⋅26 possible characters, and log2[(2⋅26)9] ≈ 45.6; round it down to 43.

But if everyone chose 8-character alphabetic passwords uniformly at random like this, it is essentially guaranteed that they'd collide at some point!

Suppose we choose passwords uniformly at random from a space of k possibilities. If there are n passwords in the world the probability of a collision by the birthday paradox is at most n2/k. When k and n are the same, that bound doesn't mean anything, but the probability is extremely close to 1.

But suppose all picked our 1000 passwords each independently and uniformly at random from 2128 possibilities—say, 10-word diceware phrases with a 7776-word list, or 20-character graphic US-ASCII strings. Then n = 243 and k = 2128, so the probability of a collision between any two of the passwords the seven billion people have chosen is at most n2/k = (243)2/2128 = 286−128 = 1/242—less than one in a trillion.

I recommend that if you want password security you should let a computer pick a password for you uniformly at random from over 2128 possibilities. (For services that use unsalted password hashes, maybe double the length to mitigate multi-target attacks.)

Does this make the conclusion right?

The conclusion—one should not reuse a password that has been used by anyone ever—seems to take as a premise that my goal as the user is to prevent anyone form guessing the password. Maybe I as a user don't care if someone can guess my password and it's more important that I can just remember it. One might make throwaway accounts all the time—see, e.g., BugMeNot—to subvert advertising-driven mass surveillance that relies on tracking users by login for higher-value advertisements.

Squeamish Ossifrage
  • 2,646
  • 9
  • 17
0

I think it would depend on how passwords are handled on the targeted system.

For a system that uses best practice salting and hashing, password lists are only useful in a brute-force attack. An attacker would have to compile a hash lookup for each account, using its specific salt. That's effectively a brute-force attack on the password file (or table); with cryptographically secure hashing, it's infeasible on a large password space (hashing each password takes a non-trivial amount of time). An attacker might prioritize known passwords ahead of all other possibilities, but that's still a large space.

For a system that uses less than best practice, it would depend on the specific security flaws as to how a list of known used passwords might accelerate an attack.

Since you can't be certain what measures are in place on any given system, it might be prudent to avoid likely known passwords, but you aren't necessarily giving a hacker an open door by using an obscure password that happens to have been used by someone else at some time.

Zenilogix
  • 171
  • 3
0

I don't think any math is needed other than set theory. The purpose of a password is to act as a method of authenticating, you are who you say you are because you know the secret. This "secret" ideally should be random to prevent brute force attacks. That is, attacks on the platform the credentials are for. Brute forcing is a last resort as it isn't efficient, you're literally blindly trying every permutation possible for that "secret." Here you have three sets:

  1. The set of all possible permutations.
  2. The subset of #1, the set of all possible permutations used by everyone, ever, known or not.
  3. The subset of #2, the set of all possible permutations that are known (breaches).

Number 3 is only useful to use as a way to trim the attack because it's permutation count is lower than #2 and certainly lower than #1. Logically, one can assume that #2 isn't feasible just by the fact that no one has a collection of every password used ever. However, the important point I think is that #2 wouldn't be that useful on it's own. The idea is to trim down your attack to increase it's efficiency. Unless the target is a high valued target, #2 is likely already too large to be useful for trimming the attack. A dictionary attack, using actual dictionary terms or just common password variants, is useful largely because the permutation space is that much smaller than exhaustive brute force. #2 increases that space to the point of being impractical for the attacker just as much as the user.

Ironically, I'd argue that if #2 actually were released by some magic, avoiding any permutation in that list may make your more susceptible, as you're decreasing the potential permutation space an attacker would need to cover, for the same password length.

An attacker, if #2 was available, would very likely still try and only use it as a tool to make a better brute force attack by creating a dictionary from the highest frequency passwords from that set.

With that said, it is worth noting that the entirety of HIBP database still represents a relatively small subset of all permutations. Thus, it is still efficient to use the entirety of it as a dictionary attack. An attacker may still trim to the highest frequency if they're wanting more efficiency, but it wouldn't be a requirement, unlike #2.

-2

There are no realistic conditions under which it is beneficial to reduce the set from which passwords are selected. Intentionally eliminating some passwords from consideration just shrinks an attacker's search space.

David Schwartz
  • 4,233
  • 24
  • 21