7

I'm wondering, would it be clearer to declare a company wide requirement with regard to password theoretical entropy, rather than the usual "at least one big letter, and a small latter, and special character..."

Thus if we target a reasonable entropy level for humans to remember, say 60-bits, then calculate the entropy.

This can be calculated dynamically and locally and give user feedback as needed.

Is this not a better, language/region agnostic way to do a password policy?

Woodstock
  • 689
  • 6
  • 20
  • 16
    The only problem is getting the average person to understand it. – schroeder Jun 02 '20 at 12:30
  • 8
    How do you define "better"? – schroeder Jun 02 '20 at 12:32
  • 17
    It depends on the level of security you would like to achieve. Purely looking at the entropy also allows common passwords such as: 'Password1234' (50.8 bits) and 'MyPassword101' (58.7 bits). – roy.stultiens Jun 02 '20 at 12:36
  • 10
    @roy.stultiens No, it doesn't, because *passwords* do not have entropy. *Password generation methods* have entropy, and any generation method with 50+ bits is extraordinarily unlikely to produce those passwords. If a generation method has 50 bits of entropy, that means (to first approximation) that there are at least 2^50 passwords it could have selected from. – Ray Jun 03 '20 at 13:49
  • 1
    @Ray, usually the cracker does not know the generation method, in which case the entropy with respect to the cracking method, basically the probability they'll try that password, is more interesting. Of course since any cracking method will probably try those two examples in the first thousand or so, their entropy is 8 bits give or take. – Jan Hudec Jun 03 '20 at 19:01
  • 1
    @JanHudec While the cracker does not KNOW the generation method she takes a reasonable guess at it which leads to passwords like "12345" getting tried early and ":C – Jens Schauder Jun 04 '20 at 08:33
  • 1
    @JensSchauder, of course; but if you come up with some words that are easy to remember for you for some specific reason, but are not common words, then an attacker who does not know the method will still not guess them even though the method used to generate them actually had a very low entropy. A targeted attacker may of course learn them in some way and then you are in trouble. – Jan Hudec Jun 04 '20 at 09:20
  • 1
    @schroeder *The only problem is getting the average person to understand it.* Many interfaces nowadays have a dynamically-updating bar that measures the "strength" of my new password while I type it. I think that the average user can understand this kind of UI just fine. – Federico Poloni Jun 05 '20 at 14:20
  • @FedericoPoloni that's a UI, not a policy – schroeder Jun 05 '20 at 14:39
  • @schroeder The policy is "your password is acceptable when the bar reaches 'strong'", and I think that's all the final user needs to know. – Federico Poloni Jun 05 '20 at 14:43

5 Answers5

37

The fundamental issue is that entropy can only be estimated from the password itself, and that estimate can be very very wrong. The entropy is determined by the password generation method. You can't measure the entropy of the method from a single password.

Let's look at a practical example. I find it easiest to memorize very long passwords generated from a small password space, so I'm going to use numbers only and make it very long. Your algorithm looks at my password, sees that it only contains numbers (aka character set size is 10) and that it is 20 characters long. This gives it an entropy of:

log2(10^20) = 66.4

It passes your test! However let's stop and look at the password:

01234567890123456789

Hmmm... turns out that the actual entropy is pretty much zero.

I could get a lot more technical but in this case I think it's better to keep the answer simple. I believe this example should provide a sufficient answer to your question.

Conor Mancone
  • 30,380
  • 13
  • 92
  • 98
  • 1
    thank you, upvoted! I am mandating that people choose a random password, but I agree that's an ask for the average user. – Woodstock Jun 02 '20 at 13:48
  • 2
    If you mandate that then perhaps you might as well make your system choose the random password for them. If you do that then of course you will control how much entropy goes in to it. – bdsl Jun 03 '20 at 11:50
  • I found that a good option is to use a password manager such as KeePass and let it generate passwords (and store them, of course.) By using cloud storage (my own server, encrypted password database) it is possible to access passwords from anywhere and have them automatically entered into login forms when needed. – Hans-Martin Mosner Jun 03 '20 at 12:31
  • 2
    Actually, the entropy is determined by the method that *will be used to crack it*, which may or may not be related to the one used to generate it. – Jan Hudec Jun 03 '20 at 19:04
  • @JanHudec unfortunately not true. The entropy is intrinsic to the password generation method. A 4 digit number has 9999 possibilities - that is the entropy. If an attacker doesn't realize that and tries to brute force all 4 character alphanumeric strings then they will make quite a lot more work for themselves, but the entropy of the password hasn't changed. Your comment is like saying my age is determined by my drivers license - while my drivers license may be one way to find out my age, my age is still intrinsic to me. – Conor Mancone Jun 03 '20 at 23:50
  • 3
    @ConorMancone But what *matters*, at least much of the time, is the method that will be used to crack the password. The length of time that a password cracker will take to find your password depends on your password and the cracker, but not the method that was used to generate the password. For that matter, the age printed on your driver's license is what determines whether or not you can buy beer. :) What the password generation method does, of course, is provide a practical guarantee that a password will take at least such-and-such many attempts to crack. – Tanner Swett Jun 04 '20 at 01:27
  • @TannerSwett yes, I 100% agree. When talking about entropy though it only makes sense to talk about how passwords are generated. You don't need to specify how the password will be cracked to decide how much entropy it has. In a conversation about calculating cracking time then obviously you *do* need to know something about how it will be cracked – Conor Mancone Jun 04 '20 at 09:54
  • To complicate matters, it's possible (though highly unlikely) for a high-entropy generator to produce a terrible password such as your example. – OrangeDog Jun 04 '20 at 12:22
31

The tests for any policy are:

  • people know about it
  • people understand it
  • people know if they are complying with it
  • people know how to comply with it

Your approach is about 2 out of 4 on that scale for the average user.

The better option is to demand randomly generated passwords. That's easy to understand, easy to implement, and easy to provide processes and tools for ("just use this password manager").

With your approach, you are basically trying to get people to be their own random generator. This is going to result in a lot of trial and error as people try to figure out what password will pass the test. This will result in frustration and confusion.

But that's assuming that you are writing a policy for the average user and assuming your calculation of entropy is valid (which seems beside the point of your question right now, and I have some serious reservations about it).

schroeder
  • 125,553
  • 55
  • 289
  • 326
  • 6
    Good point on how this basically asks people to ask like random number generators. One (of the many) reasons why people are so bad at passwords is because we are, inherently, terrible random number generators. – Conor Mancone Jun 02 '20 at 12:39
  • It is not always possible (or rather - practical) to use random passwords. An example is Windows login. In that case a xkcd-like passphrase is better. Otherwise yes - password managers. – WoJ Jun 03 '20 at 12:41
  • 2
    @WoJ "random" does not have to mean "random string". "Random 3 words" is a standard by the UK's NCSC. – schroeder Jun 03 '20 at 13:29
  • 4
    @WoJ And to follow up schroeder's comment, the xkcd method you mention specifically assumes that the words are all selected uniformly at random. The entropy's way lower if you use passphrases that mean something. – Ray Jun 03 '20 at 14:09
  • 1
    @schroeder: yes, my comment was in the context of *The better option is to demand randomly generated passwords. (...) ("just use this password manager")* The result from a password manager will not be a word-based passphrase (and now that I think of it - it is something which is missing in the one(s) I use(d). – WoJ Jun 03 '20 at 15:10
  • @Ray: yes it is a way lower if the passphrase means something (even more when this is "something well known"). It does not have to mean something, even when the words are connected it is a problem ("blue red yellow green"). All this said, it may be much better than "password1234". The **real** problem is how to correctly define your "way lower" and my "much better" in the actual context of the OP (any anyone else) use – WoJ Jun 03 '20 at 15:14
  • @WoJ: [Bitwarden](https://bitwarden.com/) has an xkcd-style passphrase generator. E.g., it just generated this for me: "Decibel-Basin-Resurface-Ideally-Shelve1". (This is not an endorsement of Bitwarden, it just happened to be the only password manager that had clients for all OSs and plugins for all browsers I needed at the time I made the decision. The tradeoff that it is using its own cloud sync service instead of allowing me to choose was a tradeoff I was willing to make at the time.) – Jörg W Mittag Jun 05 '20 at 06:32
  • @JörgWMittag: I do use Bitwarden (and self host it as well) and I was about to open a feature request for that :-| -- I never realized that there was something else than "password" in the choices. Thanks a lot for opening my eyes (it is truly a great product BTW) – WoJ Jun 05 '20 at 09:41
11

A key thing to understand when selecting a password policy (or a password) is that entropy isn't a property of the password. It's a property of the method used to generate it. More generally, it's a property of probability distributions that tells us roughly how much additional information you would need to uniquely identify an element drawn from that distribution if you know what the distribution is. I go into a bit more detail in a previous answer if you're interested, but for passwords, this roughly means that if there are 2^n passwords that you might have generated, you have an entropy of n.

If the users generate their own passwords, you can't know what method they used. You can only set policies that make it more likely that the users will select a method that has high entropy. When doing so, you should keep in mind that users will generally find the laziest way of complying with a policy, which is why requiring that a password must contain capital letters and numbers is basically the same as requiring that the first letter be capitalized and that there be a single digit at the end.

The best password policy I've seen is Stanford's, which makes the special character requirements less onerous the longer the password is, to encourage the use of long passphrases instead of Password1$. If the password contains fewer than 12 characters, it requires every sort of character type. This restriction is relaxed as length increases, and once the password contains at least 20 characters, there are no additional restrictions. (There is also no upper bound for password length. Nothing is more annoying than a password policy that forces me to use short passwords in the name of security.) It then suggests randomly selecting 4 words as an easy way to get passwords that long, which is a password generation method with high entropy.

Under this policy, the good approach is also the laziest one, which means the users might actually do it.

Stanford Password Policy

Ray
  • 231
  • 2
  • 5
  • 1
    @Woodstock You might also find [DiceWare](https://theworld.com/~reinhold/diceware.html) interesting. I personally use their method but with some of the EFF word lists (linked at the bottom of that page) – GalacticCowboy Jun 03 '20 at 18:34
  • 1
    I like that Stanford approach. Does anyone know how they implement it? I know how to set length minimums and complexity requirements (essentially just complex or not-complex) in Active Directory for Windows accounts, but I don't know how I would go about implementing the staggered requirements. – Doug Deden Jun 03 '20 at 21:47
  • 2
    I think the presentation of this password policy to a normal user would lead to an inherently unsafe adoption of it by users. The sample password only consists of nouns and an average user is unlikely to choose between more than 500 different nouns when asked for some. So you just end up with no more than 60 billion different passwords which is still pretty easy to brute force once hashes are leaked. And I think the 500 nouns are still vastly exagerated because a normal user would for example just look around to find some nouns to use. – SpaceTrucker Jun 04 '20 at 06:14
  • 2
    @SpaceTrucker:Well, 60 billion is about 37 bit of entropy (assuming each PW is equally likely). That's not _that_ bad for a user-generated password... – sleske Jun 04 '20 at 12:35
  • @DougDeden Although I'm not a Windows person, I doubt you can do it in such a case. These things are easier when you have full control over the login process. – Conor Mancone Jun 04 '20 at 13:10
  • 1
    @SpaceTrucker That doesn't really matter. You're imaging an attacker that says, "I bet this person just picked some nouns. Let me load up a list of common nouns and use those to build my bruteforce list". While such a thing _may_ happen if that style of passwords becomes common, it's not the case now, so such a password would absolutely be safe from the sort of attacks that are a threat to most users. – Conor Mancone Jun 04 '20 at 13:12
  • 1
    @ConorMancone At least for accounts related to stanford university I would guess that it might already be the case that those kinds passwords are now common enough. – SpaceTrucker Jun 04 '20 at 13:58
  • 1
    @SpaceTrucker As GalacticCowboy mentioned, Diceware works nicely in conjunction with this policy to solve the "500 nouns" problem. The dictionary could be linked to on the policy page. Four Diceware words gives 51.7 bits of entropy. – Ray Jun 04 '20 at 15:30
  • @SpaceTrucker I agree with you. It's important to randomly choose words from a large dictionary. If users choose words that just come to their mind, the dictionary will be reduced to a few favorite words or common objects. Something like "desk coffee lamp window". It's probably not worse than most regular passwords chosen by users but it's nowhere near diceware entropy. – kapex Jun 05 '20 at 13:57
  • 2
    @ConorMancone If there apparently are infographics out there that recommend to chose 4 nouns as best practice, then it's not unreasonable for an attacker to think that users will do exactly that. – kapex Jun 05 '20 at 14:02
  • 1
    @kapex More to the point, it shouldn't matter if the attacker knows what method you use. [The enemy knows the system](https://en.wikipedia.org/wiki/Kerckhoffs%27s_principle). A good password will be strong because there are billions of billions of possibilities to go through even if you know the exact method used. *My* passwords are usually 4 words uniformly selected from the Unix dictionary file, all lowercase, separated by dashes. If a system insists on capitals and numbers, the first letter is capitalized and there's a 3 at the end. Knowing this information won't let you guess them. – Ray Jun 05 '20 at 14:20
  • @Ray based on that your password is clearly `Leonel's-Lauri-Jacobin-Lucretius's` – Conor Mancone Jun 05 '20 at 15:08
  • based on this most hacked together line of code ever (note: not for production use, lol!) `for i in {1..4}; do sed -n "$(($RANDOM % 100000))p" /usr/share/dict/words; done | tr '\n' '-'` – Conor Mancone Jun 05 '20 at 15:09
  • @ConorMancone Leonel's-lauri-jacobin-lucretius's_**3**_. – Ray Jun 05 '20 at 16:23
  • @Ray lol! You've officially won the internet for the day. – Conor Mancone Jun 05 '20 at 16:47
4

I actually did this once upon a time with a few hundred users.

I estimated entropy based on the approximate alphabet size they used, where common dictionary words (taken from an English dictionary) counted as one "letter" each, and unknown words were divided into lower alpha, upper alpha, numbers, symbols, whitespace, etc. There were a few other common patterns it would identify which I won't go into the details of since it's not relevant.

If the calculated entropy was too low, the password was rejected and the user was shown some hints on how to improve it. There was certainly room for improvement but it worked very well for filtering out clearly weak passwords.

The problem was that the users hated it because it was difficult to understand (in particular it was difficult for them to make a weak password strong enough to use without making it really long).


Instead, these days, I would recommend enforcing only a minimum length, but checking user passwords against a database of known-breached passwords (e.g. https://haveibeenpwned.com/Passwords) and warning the user if their password is found.

It's tempting to block passwords which you know are bad, but if a user won't listen to a warning, it's because they don't care about the account anyway. If you force those users to pick a harder password, they're likely to compromise it some other way (e.g. by writing it on a post-it note on their monitor).


Finally, please consider whether you even need passwords at all. We're long past the days when every web service has its own login. There are a large number of single-sign-on services which you can integrate with to offload login management and make things easier for your users (as well as offering MFA, etc.), and for the more secure things using certificates is better security anyway (browser support for MTLS is pretty good now!)

Dave
  • 161
  • 3
3

rather than the usual "at least one big letter, and a small latter, and special character..."

Any policy that contains those requirements in 2020 is broken and needs to be revoked.(*)

The main things that regular users need to know about passwords are:

  • it should be long (10-12 characters recommended)
  • it should not be guessable (not "password" or "1234567890" or your name, birthday, etc. etc.)

on the IT-level, you should have a blacklist (the most common 1000 passwords or such).


(*) complexity rules are wrong. In almost every case, they make passwords easier to compromise. Don't use them. Seriously, don't. It's not the 1980s anymore.

Tom
  • 10,201
  • 19
  • 51
  • @Mark I mean to count each pattern as equivalent to 1 character long. I'll re-post my comment to clarify this. – Cœur Jun 05 '20 at 04:40
  • Instead of a 1000 passwords blacklist, better have a 1000 subpatterns greylist, where each part of your password that matches "1234", "9999", "abcd", "qwerty", "pass", "word", ... will only count for a length of 1. – Cœur Jun 05 '20 at 04:43
  • @Cœur it's not a random blacklist, it's "the most common 1000 passwords". The reason I definitely recommend to blacklist those is that people actually use them, that's why they're on that list. And brute force tools use those lists. You definitely don't want someone has a password that any automated attack will definitely try, and early on. – Tom Jun 05 '20 at 04:58