56

Wouldn't it be smarter to measure password entropy and reject low entropy passwords?

This would allow short passwords using the whole character set to pass, aswell as long passwords only using parts of the character set.

Is the above scheme possible or do implementation details prevent something like this to be done?

Does any site or programm already incorporate such password requirements?

HopefullyHelpful
  • 1,254
  • 1
  • 12
  • 17
  • 49
    Using a password entry form like this might be a bit annoying if all it told me is "your password isn't strong enough" without telling me how to make it stronger ("add 27 bits of entropy" isn't very helpful). – Matti Virkkunen Nov 17 '16 at 11:42
  • 24
    How much entropy does my random 8-char password based on the whole printable/typable character space, which incidentally and totally randomly only contains ASCII characters, contain? – I'm with Monica Nov 17 '16 at 15:50
  • 2
    @ Matti 'Entropy' isn't a useful reference to the user, but the value could be represented on the usual week to strong slider, whilst giving the usual advice to the user on how to make their password stronger. I think the point OP is making is why are requirements such as 'must contain one special character' forcing users to make certain password choices when using a longer password without special characters could be just as strong as a shorter one with special characters. – Carrosive Nov 17 '16 at 17:22
  • 2
    The problem is that there is no reliable way of computing the entropy of a string... there will always be a few bad password that makes it thourgh. By using static limits like "at least 12 characters, must contain letters, digits and puctuation" you have at least some minimum guarantee... – Bakuriu Nov 17 '16 at 20:42
  • Stanford Univ implemented an adaptive password policy system for their users a few years ago: https://uit.stanford.edu/service/accounts/passwords/quickguide – PwdRsch Nov 17 '16 at 20:56
  • 65
    Passwords don't have entropy. *Password-generation methods* have entropy. – Mark Nov 17 '16 at 21:37
  • @n00b Everyone knows "correcthorsebatterystaple" is the strongest password, and is in fact uncrackable! Everyone should be using this one! -END JOKE MODE- The real problem is that people will choose weak passwords almost every time. What you need is a system that automatically assigns a password, where the entropy is known (and high). The users shouldn't be able to change it "just because they don't like it" because that reduces the entropy. – CJ Dennis Nov 19 '16 at 05:04
  • @CJDennis then you have users doing *other* unsafe practices like writing their PW down on a sticky note next to their monitor instead of ever having any chance of memorizing `?_2Amd=,_}eZ<#j`. – MattDMo Nov 19 '16 at 18:42
  • @MattDMo I wasn't suggesting assigning unmemorisable passwords. I believe the XKCD comic demonstrates that random passwords can be both strong and easily memorised. If you were assigned `correcthorsebatterystaple` would you find it easier or harder to remember than `?_2Amd=,_}eZ<#j`? – CJ Dennis Nov 20 '16 at 12:08
  • @CJDennis obviously, the word-based one. The problem is, most companies still stick to the "more weird characters the better" idea, forcing me to take a pretty strong password (according to ZXCVBN) and put in `@` for `a`, `$` for `s`, etc., just to meet their strength criteria. Unfortunately, I've had to reduce the character count because it takes that much longer to type in the random capitals and symbols (it *is* a good carpal tunnel producer if you had to type it 1000 times in a row), whereas if it was just 4-6 random words with spaces or underscores, it'd be impossible to crack. – MattDMo Nov 20 '16 at 17:15
  • @Mark Password entropy can be defined as the smallest entropy which is enough to generate it using a common password generation method. That's how it will be cracked, anyway. – Dmitry Grigoryev Nov 21 '16 at 10:19

11 Answers11

76

After the famous XKCD strip, there were a few projects started up to deal with exactly this kind of entropy checking. One of these was the ZXCVBN password checker, made by a Dropbox employee.

It is possibly the most thorough password checker of its kind. It checks for patterns, words, and more, adding to (or subtracting from) an entropy score accordingly. It is explained in detail on their blog.

ChristianF
  • 826
  • 6
  • 8
  • 14
    I believe this is the web-based version: https://dl.dropbox.com/u/209/zxcvbn/test/index.html – mythofechelon Nov 17 '16 at 14:24
  • 23
    Um... I don't think I would type a password into a random web tool... – Ionoclast Brigham Nov 18 '16 at 02:35
  • 2
    @IonoclastBrigham but this particular tool doesn't seem to be making calls behind your back, no? – s.m. Nov 18 '16 at 11:24
  • 16
    One should always be wary of typing actual password into a web-based too. But, in this case it's just a demo to show how ZXCVBN works, and as such is meant to be used with examples of passwords. It is not a replacement for actually implementing it yourself on your site. – ChristianF Nov 18 '16 at 11:45
  • 7
    @s.m You forgot to add "at the time of this writing" – Peter Nov 18 '16 at 15:29
  • if you're really paranoid you can turn off your network adapter while using the form. – Nacht Nov 21 '16 at 00:20
  • It looks like it's all client-side JavaScript anyway. But how to easily verify that? – reinierpost Nov 22 '16 at 10:19
  • If none *have* been made, that doesn't prove none *will* be made. Combining this with a web proxy that temporarily disables requests, or something similar, will at least assure what when made they will fail. – reinierpost Nov 22 '16 at 10:23
48

This is a great idea, in fact it is the only proper way of measuring password strength.

But how would you measure password entropy?
Entropy is an aspect of the generation process, not of the output.

For example, what would be the output of such a measurement for Tr0ub4dor&3? By any reasonable measure of possible entropy based on a given password, that would be rather decent - over 70 bits of entropy. Or maybe, taking into account a supposed password generation process, I might be smart enough to realize it is actually capped to only 28 bits, since each character is not selected randomly, but first a whole word is selected. But in reality I should junk this whole idea altogether, since I obviously just copied it directly from that comic.

Same issue would apply if the password was correct horse battery staple (one of the most popular passwords amongst a certain population).

So yeah, password requirements should be based on the password entropy, but you cannot apply this requirement to a given password after the fact.

(Btw, as I mentioned in another answer on this topic (from a different direction), it could be a good idea to implement a system where passwords / passphrases are auto-generated for a given level of entropy, and provided to the user, instead of asking the users to come up with one that meets our requirements. Of course, this is what a good password manager would do on the client, anyway...)

AviD
  • 72,708
  • 22
  • 137
  • 218
  • I think the only way possible would be a system standard library for all popular OS that has a state of the art password dictionary in it and then looks up the passwords position. Otherwise you would have to send the password dictionary/cracker in js which would cost a lot of resources. Or you could send the password in plaintext or rsa encoded to the server. At which point the process would be too expensive for most servers I guess. – HopefullyHelpful Nov 17 '16 at 09:03
  • 6
    I don't understand - only for what to be possible? In order to calculate a password's entropy, you don't actually need to crack it, or any dictionary at all - you just need to examine the measure of randomness in the generation process. That's it, no need for brute force guessing. – AviD Nov 17 '16 at 09:08
  • 1
    Yeah, but password entropy is only a measure to describe how long it would take to crack a password approximately. If you want to make sure the user isn't using `Tr0ub4dor&3` then you would have to guess his creation process which you can't, so you could make a dictionary to lookup his password and see if it's strong or not. – HopefullyHelpful Nov 17 '16 at 09:14
  • 3
    @HopefullyHelpful So you suggest cracking the password on the server side to know if it's weak. If your server was not able to crack the password in one week, you let it pass. That could be a solution, if you had weeks of computing power to spare. – A. Hersean Nov 17 '16 at 09:26
  • 4
    @HopefullyHelpful That was exactly my point - that solution doesn't make sense, and cannot work. You can't measure the entropy based on the password (via cracking, dictionaries, or anything else), since entropy is an attribute of the process. Which is why I suggested generating the strong passwords, instead of guessing if the password was strong. – AviD Nov 17 '16 at 09:32
  • 1
    Btw, your server is almost guaranteed to not be the most efficient at password cracking, so that solution is even less worthwhile. – AviD Nov 17 '16 at 09:33
  • If people get presented with hard or impossible to remember passwords, they just write them down and plaster a collection of sticky notes all over their desktop, laptop back, and in their calendar and cellphone case... – jwenting Nov 17 '16 at 13:47
  • @jwenting of course. That is why I suggested either providing them with memorable passphrases, or encouraging / enabling the use of a password manager. – AviD Nov 17 '16 at 13:59
  • 7
    If I gzip and base64 "hello" I get a long sequence of 'random looking' that you might think was high entropy, but really isn't. The only way you can _ensure_ that the generation of the password is 'high entropy' is to generate it yourself, and issue it ... but that has the other problem of safe exchange, memorability etc. – Sobrique Nov 17 '16 at 17:45
  • 5
    I think this answer is much too negative, because as ChristianF's answer (indirectly) highlights, there *is* a sensible approach to providing useful measures of password entropy, embodied in the [zxcvbn password meter](https://blogs.dropbox.com/tech/2012/04/zxcvbn-realistic-password-strength-estimation/): use a statistical model that estimates how the password would hold up against as clever an attacker as we can model. This is not foolproof, could still be improved quite a bit, and has an undesirable "arms race" flavor to it, but it's better than password composition rules. – Luis Casillas Nov 17 '16 at 19:24
16

Static password policies are chosen for two major reasons: usability and the body of research demonstrating acceptable effectiveness. Most of my answer comes from the excellent research paper on an advanced password-strength meter, Telepathwords.

First, to summarize some of the research used to back up current password policies:

Password-composition rules date back at least to 1979, when Morris and Thompson reported on the predictability of the passwords used by users on their Unix systems; they proposed that passwords longer than four characters, or purely alphabetic passwords longer than five characters, will be “very safe indeed” [19] [However] Bonneau analyzed nearly 70 million passwords in 2012, 33 years later, to measure the impact of a six-character minimum requirement compared with no requirement [2]. He found that it made almost no difference in security...

This includes the work of Komanduri et al. [13] and Kelley et al. [12], who used similar study designs to perform comparative analyses of password composition rules. These prior studies found that increasing length requirements in passwords generally led to more usable passwords that were also less likely to be identified as weak by their guessing algorithm [13 12]. Most recently, Shay et al. studied password-composition policies requiring longer passwords, finding the best performance came from mixing a 12-character minimum with a requirement of three character sets [25].

Usability is a huge reason why more complex criteria like password entropy aren't used more frequently:

In a study of the distribution of password policies, Florencio and Herley found that usability imperatives appeared to play at least as large a role as security among the 75 websites examined [8]. ...

Ur et al. also studied the effect of password strength meters on password-creation. They found that when users became frustrated and lost confidence in the meter, more weak passwords appeared. [28] ...

While [Dropbox's] zxcvbn provides a much-needed improvement in the credibility of its strength estimates when compared to approaches relying solely on composition rules, this credibility is unlikely to be observed by users. In fact, its perceived credibility may suffer if users, who have been told that adding characters increases password strength, see scores decrease when certain characters are added. For example, when typing iatemylunch, the strength estimate decreases from the second-best score (3) to the worst score (1) when the final character is added. Even if users find zxcvbn’s strength estimates credible, they are unlikely to understand the underlying entropy-estimation mechanism and thus be unsure how to improve their scores. [30]

Finally, for sake of completeness, we have to realize that defining entropy in this example is very difficult (but far from impossible). There are lots of different assumptions we can make about the sophistication of a password cracker's guessing algorithm or dictionary, and these all lead to differing answers on the entropy of passwords like "Tr0ub4dor&3" or "correct horse battery staple". The most sophisticated password entropy measures are based off dictionaries of millions of passwords and advanced study of password patterns, and this level of sophistication is difficult to achieve for many administrators (and hackers).

Cody P
  • 1,148
  • 6
  • 14
13

Entropy is calculated based on how you create a password. In order to calculate entropy, you don't need to know the password, instead you need to know how it was created. Having the password doesn't help you in calculating entropy, it only allows you to make a very poor estimate on entropy.

Example:

Password123

If our password "Password123" was chosen from a list of the 3 most used passwords that contain letters, numbers, upper and lowercase, and are longer than 10 characters, the entropy of Password123 is ridiculously low.

If the same password "Password123" was chosen by a perfect random generator that creates 11 digit passwords with each digit chosen from 5000 possible unicode code points, the entropy of Password123 is ridiculously high.


In other words: You're on to something, but "entropy" is the wrong word - "entropy" already has a different meaning. What you're looking for is "strength" of a password. And strength of a password is hard to measure right, and even harder to communicate. The fact that strength changes whenever the attack methods change doesn't help either.

Peter
  • 3,620
  • 3
  • 14
  • 24
  • "Strength" as used by almost all password meters online is total bollocks. Try QWEqwe123!"$ or some such shit (simple keyboard patterns) and marvel at them regularily rated very secure. Despite the fact that every password cracker software explicitly checks for keyboard patterns. – Tom Nov 22 '16 at 11:04
  • @Tom Indeed *"strength of a password is hard to measure right"* Even the tool linked in the accepted answer thinks "correcthorsebatterystaple", **the most common 25 digit password in the world**, is incredibly strong. And it thinks "HowILearnedtoStopWorryingandLovetheBomb" takes centuries to crack. – Peter Nov 22 '16 at 14:10
7

You cannot measure password entropy, you can only measure an upper bound for it. So any password strength estimator is flawed.

Using a password estimator or annoying rules have the same effect of making the user to try to meet the requirements while keeping the password as easy as possible for them to remember. So, the harder the requirement, the harder they will try to build an easy to remember password. For example by using passwords like Pa$$word1 or passwordpasswordpassword. The problem is that an easy to remember password is also an easy to guess password.

When the service you provide is optional, you also have the risk to alienate users with too strong requirements and loosing customers.

However, you can enforce a lower bound of 10 characters, because all passwords less than 10 characters are weak and the requirement is not too difficult to meet. You can also give them advises to build strong passwords.

For your last question "Does any site or program already incorporate such password requirements?", I guess you can find such sites. However, I would not recommend following their practice. It's not because someone else does it that it's a good idea to do the same.

A. Hersean
  • 10,173
  • 3
  • 29
  • 42
  • 1
    *an easy to remember password is also an easy to guess password.* that is wrong. Passphrases are easy to remember, but hard to guess due to the sheer size of the search space. (their problem is that if you're not a fast typer, they are hell to enter, but that's a different story) – Tom Nov 22 '16 at 11:06
  • @Tom Many would disagree with you. The research on this topic is already old news: https://www.schneier.com/blog/archives/2012/03/the_security_of_5.html http://arstechnica.com/business/2012/03/passphrases-only-marginally-more-secure-than-passwords-because-of-poor-choices/ – A. Hersean Nov 22 '16 at 12:45
  • From your source: *"This is far better than passwords"* -- 20 bits of entropy instead of 10 bits of entropy is actually **a lot** better. That passphrases are not a panacea should be obvious ("I am a god" is a 4-word passphrase, but has only 7 non-space characters). – Tom Nov 22 '16 at 15:47
6

How do you measure password "entropy"?

It's impossible.

A password like "hresda" may have 'low entropy' because it was chosen from lower case letters, but if it was randomly generated from a set of characters containing upper/lower case letters, digits and symbols and the result just happened to only contain lower case letters, then it has higher entropy. A password like "A63ba!" may have lower entropy than "hresda" if it was generated specifically as [upper case][digit][digit][lower case][lower case][symbol] rather than just randomly chosen.

pscs
  • 169
  • 1
  • 5
    Of course, entropy in the generation process doesn't help you if the random box spits out a dictionary word. What really needs to be estimated is "work of cracking", and that is indeed lower for "hresda" independent of how large the original character set was. – Ben Voigt Nov 17 '16 at 16:01
  • 2
    Pragmatic answer: entropy is measured as the number of guesses an off-the-shelf password cracking tool will need to make. Perfectly possible, and reasonably objective. You can argue that a password has _less_ entropy than it appears if you know the rule used to generate it, but a password can never have _more_ entropy than it appears ... that just makes no sense. – Mike Ounsworth Nov 17 '16 at 17:36
  • 1
    The odds against any "low entropy" password being generated by a high-entropy process are increadibly low. Like "it is more likely there is a bug in the heavily audited open source software that generated it that caused it to lock into lower case in some situation" low. Like "I don't believe you" low. Like "you flipped a coin 1000 times and they all came up heads" low. Given a high entropy space randomly selected from, any particular low-entropy subspace is going to be ridiculously small and unlikely. It "could" happen, probably but not in the lifetime of the universe, low. – Yakk Nov 17 '16 at 18:09
  • 1
    Suppose your system has 2000 bits of entropy in password generation (strong system!). The space of 6 character lower case english letters has about 28 bits. The **maximium** chance that such a password is generated is 1 in 2 to the power of 1972. – Yakk Nov 17 '16 at 18:12
  • 4
    I would also add that if in some alien language "A63ba!" meant "password", your password would basically be cracked instantly if the aliens decided to try that first. The lower bound of the entropy of any password can always be 0. The entropy of a password is what we give it using our own information and knowledge. – Bloc97 Nov 17 '16 at 19:49
2

Yes, there are programs that measure the entropy of a password to decide if it is good enough or not. Once such program is Wabol Talk. The feature is implemented using the estimate_quality method in the program's main module. Ultimately, the method is used in the method just above it (error) to validate password fields that are used to generate keys and initialization vectors. The estimation is only of minimal quality since it does not judge passwords based on their frequency of use, but it demonstrates one of the simplest ways to find how many bits of entropy are present in a password.

Noctis Skytower
  • 123
  • 1
  • 4
  • While `estimate_quality` does a decent job at rejecting bad passwords it isn't really an estimate of entropy. For example, just taking the password length would be a better entropy estimation from a theory standpoint (although would be pretty poor at rejecting bad passwords: `aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa`). – grochmal Nov 17 '16 at 23:20
  • Without knowing the length or design of your "bad" example password, some would argue that it is still a fairly good password. – Noctis Skytower Nov 18 '16 at 17:05
  • Your theory is bunk, when you see a password that matches a common pattern it is far more likely that it was created using the pattern than it happened to fall out of a random number generator. – Peter Green Nov 18 '16 at 18:32
  • It is not my personal theory. If a password had one billion characters in it and nothing more was known, it could be seen as inherently secure regardless of how it was constructed (assuming its construction method was unknown to an attacker). – Noctis Skytower Nov 18 '16 at 18:39
  • More realistically, it is pointless to create a password that has more bits of data than the number of bits returned from that hash that secures it. – Noctis Skytower Nov 18 '16 at 18:41
  • @ Noctis Skytower Do you happen to know why WiFi WPA2 generates a 256 bits key from a 160 bits SHA 1 hash using PBKDF2? – Dick99999 Nov 18 '16 at 20:14
  • Why is anything the way it is? Either it was designed to be that way (consciously or unconsciously), or a mistake was made. Choices made in the past could have been made differently if the decision maker had so desired. To answer your question, see [Deriving 256-bit key from PBKDF2-SHA1](http://crypto.stackexchange.com/questions/34686). That does not make a ridiculously long password any better. If the password space is larger than the hash space, there is probably a better way of solving the "problem" than trying to generate the original password. – Noctis Skytower Nov 18 '16 at 21:29
  • I actually fully agree with your point (in the comment) that my example password has high entropy because it is long. The amount of randomness in generating `g)yJa#Hu` and `aaaaaaaaa` is absolutely the same. What I do not agree is that the `estimate_quality` procedure in the linked code estimates entropy. Using the same example, that procedure would say that `g)yJa#Hu` is better than `aaaaaaaaa`, although their entropy should be similar if not equal. – grochmal Nov 19 '16 at 04:10
  • That is because `estimate_quality` makes at least one assumption: the attacker will know something about what sets of characters were used during the creation of the password. The password `aaaaaaaaa` requires ~43 bits to store in an efficient format, but the password `g)yJa#Hu` requires ~52 bits to store in an efficient format. – Noctis Skytower Nov 21 '16 at 17:28
1

A note about the international environment. If the password is created for a website open to the whole world (not uncommon), chances are the user does not speak english as a native language, and may instead choose, say, swedish words as a basis for the xkcd password creation method.

"paraplyost" (umbrella cheese) would probably rank lower in an entropy checker that knew about the swedish dictionary than one that didn't. If an attacker knows that the user is swedish (not the most common scenario, but it may happen), he might be able to hack a user's password easier than the entropy checker was aware, unless the entropy checker is loaded with hundreds (or thousands?) of dictionaries.

One nice side effect of this is that if you speak a language that isn't one of the most common in the world, you can make even safer passwords as an anonymous user on an international web site. :)

TV's Frank
  • 111
  • 1
1

Does any site or program already incorporate such password requirements?

Several websites do that. For example, go to https://www.dropbox.com/login and try to register with a password which is a common dictionary word of 8 characters or more (e.g. dictionary). You'll see the strength meter is showing you the lowest score. Now try to shuffle the letters randomly (e.g. ioictnadry) and use that - you'll see the strength meter going up.

The obvious downside to this method is that you need to upload a dictionary of common passwords to every client using the login form. This recently became OK but was totally unacceptable 5 or 10 years back.

Dmitry Grigoryev
  • 10,122
  • 1
  • 26
  • 56
0

My impression is that most cracks use existing word & password lists. And a few variations on those, like adding numbers or repeating a word. Exception noted below.

So a password is also weak if it appears on a list, however strong the entropy calculations (of the generation process) might show. On 'a list' meaning: if the cracker's list does not contain your password, it's strong, despite of the entropy calculation. So it's almost impossible to check the (unknown) list beforehand. Trying though, some 65 million words, does not harm, see Stanford's SUNet ID Passwords. 65M is a lot but this list contains almost 2.5 billion passwords:MySQL Passwords

A high entropy PW generation process and a tool checking that entropy, should both be complemented with a rejection step that uses lists and patterns. Even those high entropy processes can generate easy to guess passwords , such as aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa (see below), and password1 .

I have asked the cracker of Ashley Madison's hashes how they could crack long passphrases they published. Just a generic answer, but after analyzing a the published ones, I found that most of those long phrases are common expressions or hash tags, see source of some cracked AM phrases

The only exception to lists as sources, is passwords that have a pattern like: capital, then lowercase followed by a number. Common patterns, with hit %, are available on the Internet. Those could indeed easily be checked beforehand.

-- edit adjusted 2.5 billion; 'aaaaaaa' part added; rephrased rejection

Dick99999
  • 525
  • 5
  • 8
0

One word: Policy

For the past decade or so, the main drift in information security is towards compliance, control and policies. Essentially, what we do today is we build up an ISMS (Information Security Management System) based on C-Level commitment and corporate policies.

And when you get to writing the password policy, you are not at the technical level. You are writing a document that employees need to read, understand und possibly sign. You are writing a document that management has to accept and support. You cannot get too deep into mathematics, or you will not get acceptance.

Finally, when you think about how to actually implement the password policy, in almost all organisations you aren't free to invent your own rules. You will have something like Active Directory running and your implementation has to work with it, or the IDM system, or the legacy stack or whatever.

So, with tears in our eyes, we write password policies we know to be half stupid, because it's the best that we can actually get done.

Tom
  • 10,201
  • 19
  • 51