29

We should all know the XKCD comic on password strength, suggesting (appropriately) that a password based on multiple common words is more secure and memorable than a password such as Aw3s0m3s4u(3 or something.

I have an application (multi-platform) that I want to generate somewhat secure passwords for, and my password requirements are much less demanding: if the password has no spaces I expect the 'multiple symbols, numbers, mixed alpha and 6+ characters', but if the password has more than one nonconsecutive space I'm relaxing the symbol/number/mixed case constraint, and instead require at least two words that are no less than 4 characters individually, with a minimum password length of 15 characters.

The question isn't about that aspect, but about generating: assuming I want to generate an easy-to-remember and hard-to-guess password for the user, is it cryptographically safe to generate a password based on 5 or so dictionary words from a 10k word list? (Literally 10k words sit in my database, scraped from various sources, emails, etc.) They're all pretty common words, no less than 3 characters in length.

Now I don't want to make these one-time passwords, but I'm suspecting I should at least require the user to change it to something else upon logging in after using this generated password, which is fine and I can, but I also want users to have the option (on changing a password) to generate a 'secure' password that fits my requirements.

From a cracking standpoint, how easy/difficult would it be to attack a password generated using this scheme? There's no fixed length, words in this database table range in length from 3 characters to 11 characters (environment is a word in the database, for example)? The programme generating the passwords will not pick two words with 4 or fewer characters (so the shortest password could be one three-character word, 4 five-character words, and 4 spaces, for a total of 27 characters), and it will not use the same term twice in a password.

Based on samples I've run against it, the average password length generated by the programme is ~34 characters, which seems acceptable to me. Even if we assume that each of the 27 minimum non-space characters (so 23 characters in the end) can be 26 possible states (a-z), that's 23^26 or 2.54e+35 possibilities.

There are 994 words in the database with 3 to 4 characters in length.

We can also assume that the attacker has the dictionary, and the generation parameters/algorithm. Is this still secure, can I get away with taking one word away from the generated password (that's still 21 characters, for 18^26 possibilities (4.33e+32) based on entropy alone), the only problem I see is that this isn't based on character entropy, but on word entropy, which would mean the 5-word password is 10000*9006*9005*9004*9003 possibilities, or 6.5e+19 possibilities, and the 4-word password is 10000*9006*9005*9004 possibilities, or 7.30e+15. Compared to a normal 6-character password ((26+26+10+33)^6 or 7.35e+11 possibilities: 26 lower alpha, 26 upper alpha, 10 numbers, 33 symbols) it's significantly stronger.

Another assumption I made: users will write this down, they always do. I suspect that five random words on a piece of paper (hopefully not in direct sight, but alas that's the most likely scenario) are less-likely to be picked up as a potential password than a, well, complex term that looks like a traditional password.

Lastly, before I get to my actual questions, the passwords are all salted before stored in the database, then hashed with the SHA-512 algorithm 100 times, with the salt being appended between each hash. If the user logs in successfully then the salt is changed and a new password hash is created. (I assume this doesn't help much in a brute-force offline attack, but it should help against active online attacks I would think.)

DatabasePassword = SHA512(...SHA512(SHA512(SHA512(password + salt) + salt) + salt) + salt)...)

So, finally, my actual questions:

  1. Is my math correct? (You don't necessarily have to answer this, I'm sure it's close enough in principle to demonstrate my concerns.)
  2. Is this generation secure or should I stick to the 'traditional' password generation? Do note that an attacker doesn't have any idea on whether the users' password was generated with this algorithm or selected by the user, the attacker can make an assumption if they know the length, but that may or may not be a safe one.
  3. Lastly, did I make any assumptions that would significantly alter (increase or decrease) the security of this 'idea'? (By assuming the per-character entropy of a 6-character password is 95, for example.)

Apologies for the length, I'm used to over-explaining myself to hopefully alleviate confusion.


It was pointed out that my question is extremely similar to this one, I want to point out the differences in my generation method (though, honestly, it's still similar enough that it could be considered a duplicate, I leave that up to the community to decide):

  1. Each word is separated by a space, this means that all but the first and last three characters have an additional potential state.
  2. The password is not selected by a human, it's (mostly) uniform-random generation. No words are preferred over others except to only allow one ultra-short (3 or 4 character) word, once the random generator selects a word of that length no more of those may be selected. (Though the position that word will be in the list of words is random still, and there may not be an ultra-short word selected.)
  3. This is mixed in with a separate password restriction, which means the attacker has two vectors to attempt to crack. The user could have selected a password meeting the 'traditional' requirements or a password meeting the 'XKCD' requirements.
Der Kommissar
  • 490
  • 1
  • 4
  • 12
  • 10
    I think you might've missed the most popular question on this SE, which is exactly "How accurate is the XKCD password generation argument?"http://security.stackexchange.com/questions/6095/xkcd-936-short-complex-password-or-long-dictionary-passphrase/6096#6096 . I recommmend you reword your question to ask feedback on your specific implementation, otherwise i suspect it will be marked as duplicate. – J.A.K. Feb 13 '17 at 03:02
  • 5
    @J.A.K. I've *attempted* to demonstrate the differences between my question and that other one, *hopefully* it's not too similar now, not that it matters, that one *mostly* answers my concerns. No idea how I missed it, thanks for pointing it out. :) – Der Kommissar Feb 13 '17 at 03:16
  • 1
    I do see that your question does cover a bit that isn't mentioned, it was well-worded. It's just that in my experience sometimes posts get flagged mercilessly here ;) – J.A.K. Feb 13 '17 at 09:57
  • 2
    just make sure your random generator is actually random enough. – Matija Nalis Feb 13 '17 at 11:13
  • The letter-based entropy calculations seem to be wrong, if you use a real dictionary, the probability distribution of each of the letters will be very far from uniform. And that's without considering the conditional probability of each letter based on their position in the password. – biziclop Feb 13 '17 at 13:20
  • Besides, is `SHA-512` considered okay for storing password hashes? I understand you do it a 100 times, which could be fine now, but will it still be okay in five years? Would an algorithm that adapts its number of iterations not be more suitable? – biziclop Feb 13 '17 at 13:38
  • 5
    FWIW, all these restrictions: "The programme generating the passwords will not pick two words with 4 or fewer characters (so the shortest password could be one three-character word, 4 five-character words, and 4 spaces, for a total of 27 characters), and it will not use the same term twice in a password." really only serve to make the resulting password _weaker_, not stronger. Probably not much weaker, but weaker nonetheless. – Ajedi32 Feb 13 '17 at 14:19
  • 2
    Also, "The password is not selected by a human, it's (mostly) uniform-random generation." applies to the XKCD method as well. The comic says "four _random_ words". – Ajedi32 Feb 13 '17 at 14:20
  • @biziclop Regarding the entropy calculations: what *would* be the right calculation if there are **two** password schemes being used? Because that's what is happening with this right now. – Der Kommissar Feb 13 '17 at 16:55
  • @Ajedi32 But would allowing the shorter words multiple times (for a minimum generation length of 19 characters) weaken it to brute-force at all? If not I'll happily remove those restrictions, shorter words *tend* to be easier to remember. – Der Kommissar Feb 13 '17 at 16:57
  • @EBrown I guess theoretically the algorithm could randomly choose a password which would be weak against a brute force algorithm optimized for a different password generation method (e.g. if your algorithm randomly chose the word "a" 5 times in a row that would be bad, even though that's unlikely to actually happen in practice). In general though, a password based on random dictionary words doesn't have to worry about getting cracked by an algorithm which is just trying random characters, since a dictionary attack will almost always be more effective. – Ajedi32 Feb 13 '17 at 17:18
  • @EBrown For example, even an incredibly simple, all-lowercase, single-word password, "cat", from a 10,000-word dictionary would be cracked in 10,000/2=5000 attempts on average by a dictionary attack, but would take ~26^3/2=8788 attempts by brute force on all possible lower-case letter combinations. – Ajedi32 Feb 13 '17 at 17:23
  • 3
    The math for the 6-character password is wrong. It should be 95^6, rather than 6^95, and that makes it much weaker than the 5-word passphrase. – Josh Townzen Feb 13 '17 at 18:48
  • @EBrown That heavily depends on the assumptions you make about the kind of "freeform" passwords that people give. For example in the rockyou list you get 50 times as many `a`s as `q`s. So any brute-force algorithm that takes that into account can jiggle around the order in which it tries the letters to account for that fact. But even that is just a naive analysis, if you create a table of which character is likely to start a password (typically uppercase) or end it (typically number), you can get even better results. – biziclop Feb 13 '17 at 18:51
  • @JoshTownzen Updated my question, thanks for pointing that out! – Der Kommissar Feb 13 '17 at 19:12
  • ... Does running SHA multiple times really make the hash significantly more secure than running it once? – Shadur Feb 14 '17 at 09:22
  • @Shadur not at all; it may make it slightly weaker (though this hasn't been demonstrated, AFAIK). It does make it slower, and acts as its own salt. Both properties may be desirable (and both should be implemented differently, e.g. PKDBF - cycling hashes is a case of rolling one's own crypto). – LSerni Feb 14 '17 at 11:34
  • ''if the password has no spaces I expect multiple symbols, numbers, mixed alpha and 6+ characters'' - ugh. I hate enforced rules like this. Years ago I came up with a perfectly good 32 character password full of random letters and numbers, but now because I don't have a symbol in it I have to generate and somehow memorise another new password? This kind of thing means users are more likely to pick PassW0rd! than actually come up with a new strong password. – PaulG Feb 14 '17 at 16:03
  • @PaulG The sad thing is this is now the de-facto standard. You can't create an account anywhere without having some super obscure/difficult password. – Der Kommissar Feb 14 '17 at 16:04
  • My point was more along the lines of suggesting you don't do that. By all means have some kind of indicator that suggests how secure the password is (based on the rules you mention), but to limit the validity of passwords based on some artificial (and impossible to judge) criteria is, imho, a **bad idea**. For example, I guarantee a password of 32 random characters (but without a symbol) is stronger than a 6 character password with a symbol. – PaulG Feb 14 '17 at 16:14

4 Answers4

35

First, there is no such concept as a cryptographically secure password. The aim of a password is to be hard to guess for an attacker and how hard it should be to guess depends on how the password is used: if the account is locked after three failed attempts the password can be more weak compared to when an attacker can try an unlimited number of passwords or when the attacker has access to the hashed passwords.

In your case you create a password from randomly choosing 5 words from a set of 10k words. Assuming that the attacker knows your dictionary (not unlikely because your requirement is easy to remember words) and the way the password is constructed from the dictionary this means that there are (10^4)^5 = 10^20 variants. This is similar to guessing 20 digits or a password with 12..13 random alpha-numeric mixed-case characters. Such passwords are usually considered secure enough for most purposes.

As for the storage of the password: don't invent your own method but use methods proven to be good. For details see How to securely hash passwords? In the current form I would consider the chosen method weak because you are using only 100 iterations with a hash function which is designed to be fast. Just for comparison: PBKDF2 recommended at least 1000 iterations in 2000 already and LastPass used 100.000 iterations for server side hashing in 2011. Fortunately you kind of make up for the weaker password storage by having more complex passwords.

Steffen Ullrich
  • 190,458
  • 29
  • 381
  • 434
  • 1
    Unfortunately the storage is restricted by a different system, and that system cannot be modified, otherwise I would have used PBKDF2 for them, but I don't have a choice. Next, would I be safe in removing the limitation of one word of length 3/4 characters, so that it's possible that all words could have that length (but no duplicates)? – Der Kommissar Feb 13 '17 at 16:50
  • 2
    @EBrown: as for the password storage: see my edited response how your algorithm compares with PBKDF2 (badly, but your passwords have a least a higher complexity of the average password used in practice). And in my estimation I assumed that the attacker knows which words are in the dictionary and that the password only consist of these words. This means the actual size of each word does not matter. – Steffen Ullrich Feb 13 '17 at 17:41
  • 1
    @EBrown - could you pre-hash the password using a stronger algorithm, then pass _that_ to the "other system" as the password itself? Just a thought from a non-expert. – FreeMan Feb 13 '17 at 18:30
  • 2
    @FreeMan Depending on the setup, that might not help. For example, if the other system is the server -- not just the database, but the whole server -- then any client-side slowdown of hashes doesn't do anything worthwhile except burn CPU cycles. On the other hand, if the other system is just the database (or a database with a layer around it) then it could certainly help. – Nic Feb 13 '17 at 18:37
  • @FreeMan *Unfortunately,* the users have to be able to login directly through the other system. It's an older server that houses the user account information and *also* does some high-end processing for invoicing, sales, etc. by itself. The BL means that server *has* to support user login. Short story: the hashing mechanism *cannot* be changed without changing that other system which will literally cost tens of thousands, which is not worth it for 1500 customers. – Der Kommissar Feb 13 '17 at 19:08
  • @EBrown Couldn't you just change the code that handles... Oh, wait; it's spaghetti code, isn't it? – wizzwizz4 Feb 13 '17 at 20:43
  • It is only 10^20 if the user always goes with the first suggested password and doesn't hit generate if the words are too hard to remember and goes with a different one. – dave Feb 14 '17 at 02:07
  • @dave: That's an interesting aspect, i.e. how much passwords are rejected by the user because they are too hard to remember. But my guess is that it changes the estimate by maybe factor 10 because then the user will give up to find a usable password and accept the proposal and write it down somewhere. – Steffen Ullrich Feb 14 '17 at 05:31
  • 2
    @EBrown Now is a good time to go to your managers and say, "Look, our old system uses a hash that's too fast; it's no longer secure in the event our database is stolen (or an inside person decides to become an attacker) because computers have gotten faster. We have to integrate the new system anyway; if we upgrade the old one at the same time, then we don't have to upgrade 2 systems. Also, newer hash systems are designed to easily turn up the work factor and would make this simpler to do again in the future." – jpmc26 Feb 14 '17 at 07:48
  • @jpmc26 Not when we just spend the same amount of money to go from plain-text to hashed passwords to begin with. Let's take this as a *very* large step from terrible to not-completely-crap. – Der Kommissar Feb 14 '17 at 07:49
  • @EBrown =/ Implementing bcrypt or PBKDF2 probably wouldn't have been any harder... It's not like we're talking about things that don't have standard implementations out there. – jpmc26 Feb 14 '17 at 07:51
21

It seems to me that your calculations are correct. Even though, please consider the following weaknesses:

  1. An attacker can and will sign up for your service with multiple (a LOT of) accounts. From there, it is trivial to create a close-to-original word list. So assume the attacker knows the word list.
  2. The attacker will also test your password length limitations. As described here, minimum password length of 12 character is a good hint to an attacker that consecutive dictionary words will be used, thus reducing the search space.
  3. The actual word list makes a difference too. Consider a scenario where your word list consists of an amount of longer than 8 characters long words, and your maximum password length is, say 30. This means the attacker can safely assume that no password will be constructed with 4 consecutive 8 character words. Similar to this, many assumptions can exist according to your choice of word list. So another consideration that I think you've missed is word length entropy.
  4. Lastly, the fact that you are supplying your clients with a secure and easy-to-remember password generator means that they will use it. So the attacker can also safely assume that all but the most security-aware will use your word list and you password generator.

Despite all of that, I think that what you are proposing is secure. Even when considering all of the above, the resulting search space is still huge.

These are all the problems/assumptions that I found in your description that may decrease the security of your password generator. For your consideration.

MiaoHatola
  • 2,284
  • 1
  • 15
  • 22
  • 3
    A huge word list would help *a little*, as would allowing shorter words (as there are so many of them) to account for some but not all of the password. Perhaps rate-limiting account creation (at least per IP) could be used to slow down 1. Of course unfamiliar words would increase the chance of typos. – Chris H Feb 13 '17 at 11:56
  • 1
    Regarding your point #2, just because their site allows passwords of 12+ characters does not signal to attackers that the users are choosing to make their passwords longer. Most attackers assume people will use common password formats regardless of whether the max is 12 or 50 characters. They won't know the ave password length until they crack at least a decent sample of the passwords, and to do that they must be trying these word combos. – PwdRsch Feb 13 '17 at 16:43
  • 2
    I'm assuming the attacker *already has* the entire dictionary, I don't keep it a secret because it's used for other things. Passwords have a *very* large max length in this system (65536 chars). – Der Kommissar Feb 13 '17 at 17:04
  • @EBrown People are not likely to use more than a fraction of the available length. But it does solve the 'max password length' problem. – MiaoHatola Feb 13 '17 at 19:48
  • @PwdRsch I was referring to *minimum* password length. You are correct, of course. Maximum password length does not hint to attackers. I will edit my answer to hopefully make it clearer. – MiaoHatola Feb 13 '17 at 19:49
4

Building off of MiaoHatola's answer:

By MiaoHatola's first point, your attacker could narrow down the list and know both dictionaries(the one with short words and the other with 5-letter words). You said your short word list had 994 entries. They have a 1/994 chance of getting the short word on the first try. Now, let's say that you use a list of 10,000 common 5 letter words. These fractions apply:

1/994 * 1/10000 * 1/9999 * 1/9998 * 1/9997

This problem illustrates the combined chances of guessing everything on the first try, assuming they know your word lists.

That is a 1 in 9,934,037,100,000,000,000 chance of guessing just the words, not the order of the words. Factor in the order, and you get quite a high number. My calculator doesn't count that high.

This number doesn't include the possiblity of what MiaoHatola mentioned in their third point because you said you don't have a maximum length. One more thing:

Instead of randomly choosing just 5 words every time, why not randomize that number too? Throw in an extra 7 letter word every once in a while. Maybe include a 2 digit number at the end. Try making the attacker guess. That significantly increases your security. At this point, you could leave the hashed passwords unguarded, because the sheer amount of possibilities is overwhelming. So to answer your question, yes. This increases security. By a lot.

user139260
  • 41
  • 1
  • Wouldn't it be `1/994 * 1/9006 * 1/9005 * 1/9004 * 1/9003`, because the 994 short words are *part* of the 10k long words, and once you choose a short word you cannot choose another short word, which means you would have to choose one of the `9006` remaining long words, or am I assuming incorrectly? – Der Kommissar Feb 13 '17 at 18:58
  • @EBrown It's assuming that the short words aren't in the 5-letter word list. There are 10000 + 994 = 10994 total words. – wizzwizz4 Feb 13 '17 at 20:46
  • @wizzwizz4 Ah, it's actually the opposite of that. The short words are part of the 10k list. – Der Kommissar Feb 13 '17 at 22:19
1

You cannot base uniqueness on 27 characters because the characters are not random.

If you have 10K words and use 5 words then 10^20 unique passwords (or phrases)
100,000,000,000,000,000,000

Is that unique enough to thwart a brute force attack
At 100 a millisecond that is is over 30,000 years

But the cost is users are entering 27 characters. I don't want to enter 27 characters. I will probably enter it wrong 1/2 the time and have to reenter. 5 words in a random order are not going to be easy to memorize.

Consider 6 characters from A-Z, a-z, 0-9, and _
That is 62,523,502,209 unique
As a user I would rather memorize and enter 6 random

I don't think 5 random words written down would be a good way to disguise a password. Especially if people know your application does that.

paparazzo
  • 181
  • 7
  • 2
    *Unfortunately*, in my experience, users are more likely to forget a six-character password with random alpha/numeric and any symbols, or they'll write it in a **very** conspicuous place, which is the other problem I wish to avoid. Do note that this is *not* a requirement, any user can still create a password like you mentioned, but in the case you pick the randomly generated one it will follow these requirements. – Der Kommissar Feb 13 '17 at 16:29
  • 1
    Experience? Have you actually tested ability to memorize 5 random words? – paparazzo Feb 13 '17 at 16:32
  • 1
    It doesn't matter if they memorize it or not: if you see `john grade tractor python plant` written down on a paper, you're *far* less likely to associate it with a password than if you see `Tr0ub4d0r&3` (et. al.). – Der Kommissar Feb 13 '17 at 16:33
  • 1
    As stated in my answer not if they know your application uses 5 word passwords. – paparazzo Feb 13 '17 at 16:35
  • 5
    "I don't think 5 random words written down would be a good way to disguise a password." But your alternative suggestion is six random characters which, when written down, jumps in the air screaming LOOK AT ME I'M A PASSWORD!!!! – David Richerby Feb 13 '17 at 18:30
  • 1
    @Paparazzi it took me about a week to memorize a 5-word diceware generated password. So yes, I can and have memorized 5 random words. You can too. – FreeMan Feb 13 '17 at 18:33
  • @FreeMan I did not assert it could not be done. My words are pretty clear. "As a user I would rather memorize and enter 6 random". I don't want to enter 27 characters even if I can memorize them or hide them in plain site. – paparazzo Feb 13 '17 at 18:38
  • 1
    That's entirely your choice, @Paparazzi. Reading through the other answers (here and at the linked 'XKCD' question), though, my impression is that 27 characters is much more secure and, _to me_, easier to remember than 6 random ones. To each his own... – FreeMan Feb 13 '17 at 19:40
  • @FreeMan But it not 27 random characters as I covered in my answer. Is 6 random from a set 63 enough? If the data is that sensitive then something that encourages them to write it down is the wrong direction in my opinion. Clearly takes longer to enter 27. – paparazzo Feb 13 '17 at 19:45
  • Any program that forced me to use a 27 character password would immediately be discarded for something less tedious. I don't care if it is made up for easy-to-remember words or not, I'm not going to waste my time on it. – Dave Feb 14 '17 at 08:46