I'm conducting a study involving passwords, and need a method that assigns a numerical value to the complexity of a password

Question

I'm currently a student and am conducting a science fair project for my high school. The study involves collecting human-generated passwords and testing them for complexity, or "crack-ability," to determine which password requirements/criteria best encourage users to generate secure passwords.

I need some system that rates each password for how secure it is. I considered using NIST's formula for calculating data entropy, but later realized how unrealistic entropy is for measuring crack-ability. I have also read some of Matthew Weir's studies and would like to, if possible, use his system, PCFG Generator. I've found the download and instructions here, but as a student, I'm still unable to figure out how to use it exactly.

Any reliable system for predicting the guessability of a password would be very helpful, so long as it considers common cracking tools (dictionary attacks, like JtR). Any help would be greatly appreciated.

Thanks!

I'm not sure what NIST formula you're referencing, but see [here](https://security.stackexchange.com/a/174744/151903) and [here](https://security.stackexchange.com/a/6096) for how to calculate password **generation** entropy (you can't really calculate entropy from the password itself). — AndrolGenhald, Dec 06 '17 at 15:40
Also, I think [zxcvbn](https://github.com/dropbox/zxcvbn) is pretty cool. — AndrolGenhald, Dec 06 '17 at 15:48

score 9 · Answer 1 · edited Dec 22 '17 at 17:39

In short there is no reliable way to do this. Your program would have to implement a set of rules. Humans then work around these. I.e. requiring a number makes 99% of average users passwords no more secure - they just add a 1 at the end of the existing password. The same with symbols.

"p@55w0rd!" would score highly on a lot of prior attempts at rating passwords. As would "This is my password!". Meanwhile "lbvgjnwhha" (randomly generated) would score poorly because its a fairly short string of all lowercase letters.

Instead I would consider a different approach. This page, for example can check a password against hundreds of millions of known to be leaked passwords. This should defeat most non-targeted (i.e. personal attributes like surname, date of birth etc.) dictionary attacks. You could demonstrate this (or download the full password hash lists and write a tool to calculate the hash for an entered password and check it against the list) while explaining password best practices & the benefits of two factor authentication.

score 2 · Answer 2 · answered Dec 06 '17 at 17:33

First I think it prudent to understand mathematical password entropy. The wiki on Password Strength is a good place to start, but in short, password entropy the likely hood a password can be guessed given a pool of symbols.

This is neither complexity or "crack-ability", entropy is mutually exclusive these two concepts are not.

Entropy is important to understand first because it has two requirements:

Number of symbols
The length of those symbols

Number of Symbols

The number of symbols provided is a baseline for the number of bits of entropy per symbol. The uppercase alphabet alone doesn't have same entropy bits as if you include the lowercase symbols as well. If you include numbers, the entropy per symbol increases. The more symbols you add, the more bits of entropy you gain per symbol, in other terms, the larger your pool of symbols, the larger the entropy is for each symbol.

Length of Symbols

The length of symbols is important because each added symbol effectively doubles yours entropy. This is why length alone is such an important concept pushed in password security.

What is complexity?

Complexity of a password is the rules we put in place to enforce the "randomness" of passwords. Here are the basic rules:

Uppercase
Lowercase
Number
Special Character

These are common to most people. Websites use it all the time to try and force people in to randomness. We, however, are not random. This is why attackers can use something like dictionary attacks, and predictive algorithms to guess passwords.

It goes beyond that though to include security measures such as:

Encryption or Hash used
The strength of a salt
Security of a Private key

and so on.

Crack-ability

This is way more complicated than a simple post here. It is however an interesting topics (least to me :) ). Crack-ability is subjective, and the part that you need to understand is that you can only measure this concept based on a defined rules set. (I can't thing of a better word, I don't like "Crack-ability", but I'm pretty sure I understand what your asking).

PCFG Generator

Depending on what your trying to define using the PCFG it can be pretty complex. The basics of a PCFG is that if you feed it something, common words, phrases, etc, it provides a probability of what might be selected next. You can see this in every day life using smart phones with predictive text. (Did you ever notice that when you first started using it, the guesses were meh and now it pretty good at knowing what you are going to say?)

You could fed it common passwords, and than check probably of a person using that password.

For you project, unless you are trying to gauge the likely hood a person will select a password, I would suggest looking in to something else.

I'm conducting a study involving passwords, and need a method that assigns a numerical value to the complexity of a password

2 Answers2

Linked