18

Disclaimer: as you will see from my question I'm a total outsider in this subject, just very curious.

I was wondering how easy it would be to crack a password-protected RAR5 file, and I found many answers along the lines of "a truly random password would be much more difficult to crack than a password based on real words". Also, a lot of answers refer to password randomness.

I know that passwords based on real words are easily cracked by dictionary attacks and probably this is what those answers refer to, but I'm still not clear about what "random" means in the context of password creation, for the following reason.

Even if I generate a sequence of characters using the best "randomizer" ever, the chances that I get HelloWorld and the chances that I get f.ex. gkwwpBnePU are in my understanding exactly the same, so does "random" in this context mean "as distant as possible from any real word"? But if yes, doesn't this make the password not-so-random after all?

The thought that started my doubt - which I believe is the same concept but I'm not sure - is: if I choose a password which is a real word but from an obscure dialect of a very uncommon language whose dictionary no attackers would feed to their cracking tools, would such password still be more crackable than gkwwpBnePU? (assuming of course that gkwwpBnePU isn't actually a real word in any language, see what I mean?).

techraf
  • 9,149
  • 11
  • 44
  • 62
SantiBailors
  • 391
  • 2
  • 11
  • Because attackers know that passwords are created by lazy humans, so real words and variations are more likely to exist (within a set of passwords) than any other individual possibility that is not within the set of (words and variations). – Sophistifunk Mar 24 '15 at 10:15
  • Thanks for all the useful answers. So my understanding is that - since even the "Truly Random Generator" can give me f.ex. `password` - expressions like "truly random password" and even "truly random generation process" can be misleading if taken literally, because to be _truly_ random I cannot discard any password on the basis of a non-random criterion, so I should accept f.ex. `password`. I also understand that the thing whose randomness can be measured is the generation process, not the passwords. And this generation process should not be 100% random or I might end up using `password`. – SantiBailors Mar 25 '15 at 15:50
  • @SantiBailors the probability that a truly random generation process would generate "password" are infinitesimally small, in other words: extremely unlikely, such that you really really don't need to take it into consideration. – AviD Mar 26 '15 at 21:07

6 Answers6

22

"Random" means: "that which the attacker does not know".

The important point to understand is that attack costs are always on average. They don't make sense on a single data point. An attacker may always get lucky and find the right password on his first try. This is merely improbable.

If you generate passwords as sequences of purely random characters, then you may obtain "HelloWorld"; but usually you won't, and, crucially, the attacker won't be able to guess with non-negligible probability that your password consists of two concatenated English words because, on average, it does not.

One way to say it is that password entropy is not a property of the password, but of the process that generated the password; and it does not impact the contents of a single password, but the average contents of passwords, taken over sufficiently many experiments. More on password entropy here.

Averages are still the important notion because the attacker, like everybody else, thinks in terms of economics (although he, like most other people, is not completely aware of it). The attacker won't bother attacking your password if his chances of breaking it are lower than his chances of winning millions of dollars at the lottery. Even if he may always "get lucky", the lottery is much less effort, and 50 millions of dollars are a lot more rewarding than an access to your Facebook account.

Thomas Pornin
  • 322,884
  • 58
  • 787
  • 955
  • 3
    I'd say "that which the attacker does not expect". – isarandi Mar 24 '15 at 00:52
  • 2
    @ThomasPornin _One way to say it is that password entropy is not a property of the password, but of the process that generated the password;_ This seems to help my understanding particularly. So would it be correct to say that - given 2 passwords and no information about the processes that generated them - it wouldn't make sense to deem one more "random" than the other one ? – SantiBailors Mar 25 '15 at 12:03
  • 1
    In a general sense, yes, you are right. However, practically, whenever I see someone using "Password1" as password, I am inclined to _bet_ that his password generation process is not very random, and I am rarely wrong. Probabilities work both ways. Also note that "no information about the process that generated them" is an abstract condition that is not often encountered in practice; for instance, it is often known that a human being was involved, a human with an Internet access even; this actually says a lot on the kind of password generation process he may have used. – Thomas Pornin Mar 25 '15 at 12:09
  • @ThomasPornin Absolutely. I was just wondering about the general meaning of "random password", without any consideration about what goes on in practice. This settles it for me, thanks. – SantiBailors Mar 25 '15 at 12:15
6

“Random” means that all possibilities in the search space (passwords of up to N character chosen in a set S) have the same probability (up to a small tolerance). The intent is that the adversary (the guy who wants to break the encryption by guessing the password) has no better strategy than try all possible passwords. With a random password, the adversary has to try half the passwords in the search space to get a 50% chance of guessing right.

Let's say you're generating a 10-character password where each character is a lowercase or uppercase letter. That's 5210 ≈ 1.45⋅1017 possibilities, i.e. over a hundred million billion. The probability of generating gkwwpBnePU as a password is thus one in a hundred million billion and some change. The probability of generating HelloWorld is exactly the same, so there is no advantage to you to pick one over the other: the two choices are equally strong.

Sure, the attacker could guess HelloWorld. But they have an equal chance of guessing gkwwpBnePU.

If you know that the attacker is working off a dictionary, then you may want to avoid words in this dictionary. However this is only useful if the dictionary represents a significant fraction of your password space. If that's the case, your password space isn't large enough.

Let's say the attacker's dictionary consists of a million words and he'll try two words together. That's pretty large already — 1012 cracking attempts will require a small cluster of computers to carry out in a reasonable time. There's less than a chance in 100000 that your randomly-generated password is in that search space. You gain a tiny advantage in avoiding this search space, but there's a cost. First, you're adding complexity (so adding a risk of bugs, e.g. to accidentally eliminate more of the search space than you intended). Second, you don't really know what the adversary will do. Maybe one adversary uses this particular dictionary, but another adversary doesn't (and even the first guy will change their strategy once they find out what your password generation policy is). For any adversary who doesn't use this particular dictionary, you're helping them by restricting your password space. So this is counterproductive.

Choosing a password from an obscure language would be even worse. No matter how obscure the language is, if you have a dictionary for it, then you can assume that your adversary has one. Restricting to dictionary words would immensely reduce the search space and raise the capacity to find the password by brute force from infeasible to easy.

Gilles 'SO- stop being evil'
  • 51,415
  • 13
  • 121
  • 180
  • I'm slowly processing the answers, I have a doubt about why choosing a password from an obscure language would be even worse: even if both I and the attacker have the dictionary for that obscure language, he/she doesn't know that I chose from that obscure language, so wouldn't a password from that language still be stronger than f.ex. an English word ? – SantiBailors Mar 23 '15 at 12:45
  • 1
    @SantiBailors [Don't assume that the attacker doesn't know what dictionary you're using](http://en.wikipedia.org/wiki/Kerckhoffs%27s_principle). And even if the attacker doesn't know yet, a password from an obscure language is still a lot weaker than a random password. – Gilles 'SO- stop being evil' Mar 23 '15 at 13:05
  • 1
    I have the perfect obscure language for you to use: it happens that it has 10^52 distinct words in it, all of which are spelled with 10 cased alphanumeric characters! Just pick a random word from that language's dictionary. – Russell Borogove Mar 23 '15 at 22:29
  • @RussellBorogove There are many more. The best for PC-only passwords is the dictionary containing 10^95 ten-character passwords; if you want to enter it on a smartphone, you better keep it lower (10^62 [A-Za-z0-9] to 10^70, since some common punctuation are also always available on soft keyboards) – Alexander Mar 24 '15 at 10:24
  • @SantiBailors assuming that there are a 100-1000 obscure languages that could be used, using all obscure languages in addition to the current dictionaries (including English words with autogenerated variations - number replacements, extra numbers, multiword combinations) doesn't make a big increase to the size of the total dictionary. Also, obscure languages often are smaller. Oxford English dictionary has 600k words, for many smaller languages the largest dictionaries available online have 60-100k words, and really obscure languages tend to have even smaller resources, e.g. only 10k words. – Peteris Mar 24 '15 at 17:47
  • @RussellBorogove _Just pick a random word from that language's dictionary._ If I do that I might end up with `MyPassword`. – SantiBailors Mar 25 '15 at 15:41
  • So glance at the randomly generated password, and gen a new one if it looks pronounceable. – Russell Borogove Mar 25 '15 at 16:29
4

Even if I generate a sequence of characters using the best "randomizer" ever, the chances that I get HelloWorld and the chances that I get f.ex. gkwwpBnePU are in my understanding exactly the same, so does "random" in this context mean "as distant as possible from any real word"? But if yes, doesn't this make the password not-so-random after all?

Yes. This is a known problem with excluding so-called "weak keys" in cryptography. By excluding certain classes of weak keys, the remaining key-space has been reduced. From time to time, key selection algorithms have popped up that accidentally excluded almost all keys, leaving a very small search space for attackers.

The reasoning you describe is a typical precursor to this type of error: If certain keys are "weak", then surely the complete opposite of those keys would be "strong", right? However, if you try to find a key that is "the exact opposite" of a common phrase like Hello World, it will be guessable to an attacker that applies your "exact opposite" mapping function to common phrases.

There is a huge difference between avoiding weak keys, and choosing only keys that are maximally distant from a weak key according to some distance metric (the latter is a serious mistake stemming from a misunderstanding about threat models and probabilities; don't do that).

So, avoid weak keys like Hello World, but not to the extent of narrowing the key choice down to a search space that is as small as the set of "weak" keys.

techraf
  • 9,149
  • 11
  • 44
  • 62
Atsby
  • 1,118
  • 8
  • 6
  • +1. Ex.: For an attacker who knows that you used an English dictionary to concatenate a ten-character password, `HelooWorld` is as unlikely as `fgNtZxhsSY`, while `HelloWorld` and `HellAWorld` are equally likely. – Alexander Mar 24 '15 at 10:28
2

Random here means "what will be time-consuming for the attacker".

Passwords are technically made up from various characters (which may be lowercase, uppercase, ...) because they are handled as a string. There are two kind of passwords:

  1. the ones which are handled by the users
  2. the ones which are handled by a password manager

The second case is easy: go for long, complicated combinations of whatever you can, up to the limits of the application asking for the password (they put some limits, sometimes these limits are horrendous). !sg8Itp2%hjxXxo6a6TGMbJs8Jcxtk205XgZ@M^C2CmAgfC*q6 is a great password I just came up with, random and everything.

The first case is more complicated as you have a conflict between your memory, what is "random" for you and what is a "character".

Let's make it clear from the start that anything other that the truly random password above reduces considerably the so called 'key space', that is the number of possibilities for a password. the larger the key space, the better. But this is not a problem, when approached carefully.

Your only aim is to build a password which takes a "long time" to be cracked. You define yourself what is a long time. This can be 1 minute or 100 years. Due to statistics and the fact that CPU is cheap let's aim for 1000 years. 1000 years of efforts by a dedicated cracker who will have the computing power of the NSA at his disposal.

The cracker will attack you passwords by trying all possible combinations (I made some assumptions here regarding the proper quality of password storage by the application owner). This means that he will try a, b, ... aa, ab, ...

Well, no. He will not try that because he knows that your password is likely to be at least 6 characters long. And here comes one of the key points regarding randomness: you must absolutely assume that the cracker knows how your password is built. No "security by obscurity" here (it is useful in other places, but not here).

Which leads me to the last part: your memory. You will not be good at remembering the 50 characters password I gave earlier. You will end up with something which resemble words and their variations. Embrace that! Go full monthy and build a password made up from a few (4 or 5) random words!

But "everybody says that using dictionary words is bad". The ones who did not do the math do.

Let's compute that: you have a password made up of 4 words, each of them being one out of 7000 possibilities (this is the average vocabulary of the population. Please note that:

  • the total vocabulary is in the range of 70,000-100,000 words (French or English) ...
  • ... so we take a small subset of words you can actually think of ...
  • ... as you will probably not choose your words with the dice method but rather out of the blue

You therefore have 7000^4 ~= 10^15 passwords. Top-end dedicated cracking environments can try 10^10 combinations per seconds, which is about 1 day with the 4 words version. 5 words bring you to 30 years. This is for offline attacks, when the attacker got hold of your password database. An online attack is impractical (about 1000 combinations pers second best case).

As a comparison, a 8 characters password with lower case, upper case and digits has 10^14 combinations.

Notes:

  • the information above is pertinent when attacking passwords either online or offline, for correctly stored passwords. This is to say that if the password is stored in plain text or is a non-salted short one, it will be cracked immediately
  • I strongly belive that the password stategy must be chosen wisely. Sometimes no password is OK, sometimes multifactor authentication is the way to go. There is no "one size fits all" solution.
  • I also strongly belive that the password startegy must be pragmatic and address real-world constraints. Not everyone can remember 253 passwords which are 25 characters long and chnage every three months.
  • sadly, many standards and official recommendations are dumb as [censored] and we still end up with Hello1 passwords because they happen to fit the construction rules.

I did recently a comparison of password recommendations from NIST, ISO27002, HIPAA, SANS, PCI-DSS, French and German governmental agencies, ISF and CobIT (specifically on password expiration). The order above is from "well thought" to "I took whatever was invented by a random guy or gal in the 90' and put that on paper because I stopped math in 6th grade and my 4 neurons do not allow me to concentrate more that 20 seconds on a problem"

EDIT: following a request in comments, below is the conclusion from the review of the standards. This is in contrast with research showing that password expiration does not improve security.

  • no recommendation at all (NIST)
  • no recommendation other than “regular changes” (ISO27002)
  • risk-assessment based changes, less than 2 years (HIPAA)
  • 180 days (SANS)
  • 90 days (PCI-DSS, French and German governmental agencies, ISF, CobIT)

The ISF recommendation is particularly disappointing, taken into account the great work they do.

WoJ
  • 8,968
  • 3
  • 33
  • 51
  • Do you have a link to your password recommendations comparison ? – JB. Mar 24 '15 at 16:35
  • @JB. I don't, this was an internal review. I will update my answer with the final conclusions. – WoJ Mar 25 '15 at 07:51
  • Although I accepted another answer because it's the one that helped my understanding of the subject the most, of all the one-liners I got I find yours to be the most fitting: _Random here means "what will be time-consuming for the attacker"._ – SantiBailors Mar 25 '15 at 13:54
  • @SantiBailors: like the old adage says "thou should never hesitate to change your mind regarding answers acceptance ":) – WoJ Mar 25 '15 at 13:58
  • :) When I accepted the other answer I had already read yours too. The other answer is the one that triggered my understanding. I guess it's just a matter of how my brain is wired. Your one-liner is for me the one that fits the reality most elegantly as a quote, but I could realize that only after I understood the concept due to the explanations in the other answer. – SantiBailors Mar 25 '15 at 14:17
  • Fantastic then, the important past is the aggregation of answers IMHO. Since I routinely speak about passwords and their weaknesses I will keep the link handy. – WoJ Mar 25 '15 at 14:19
  • Your computation is wrong. 4 words chosen (uniformly and independently) from a vocabulary of 7000 is not 4^7000 but 7000^4 which is about 2e15 or 51 bits. 3000^4 is about 8e13 or 46 bits. – dave_thompson_085 Mar 29 '15 at 01:37
  • @dave_thompson_085: oh the shame! I must have had a bad day (particularly in line of an article about that I wrote recently, which has the right numbers). Corrected, thanks! – WoJ Mar 29 '15 at 06:52
1

Even if I generate a sequence of characters using the best "randomizer" ever, the chances that I get HelloWorld and the chances that I get f.ex. gkwwpBnePU are in my understanding exactly the same, so does "random" in this context mean "as distant as possible from any real word"? But if yes, doesn't this make the password not-so-random after all?

No, that's where your fallacy is. If your reasoning was true, it would also be trivial to win the lottery because the winning numbers are just as likely to come up as the other numbers.

The reason is that for every one HelloWorld that makes sense, there are billions of "gibberish" passwords. So an attacker who knows that you use a dictionary password has a lot fewer passwords to try.

The definition of random is simply that all allowed characters have an equal chance of occurring. Usually, another criteria is added: each letter is also independent of the preceding letters.

Both of these criteria rule out using a dictionary from any language. In every conceivable language, some letters are more common than others (for instance, in English, e, n, s and t are more common than q or x). In cryptography (separate matter from passwords, just mentioning it for interest), this allows cracking some codes with statistical analysis.

Also in any conceivable language there are rules about which letters are more likely to follow other letters. For instance, in English an h is far more likely to follow a t than following a q.

techraf
  • 9,149
  • 11
  • 44
  • 62
Kevin Keane
  • 1,029
  • 7
  • 8
  • I might have caused a misunderstanding here. I was not wondering whether `HelloWorld` and `gkwwpBnePU` are equally strong because they are equally likely to be generated by a randomizer. I was just wondering whether the fact that `gkwwpBnePU` would be considered more "random" than `HelloWorld` indicates that "random" means "as distant as possible from any real word". Does your opening "No" refer to the first or to the second questionmark in the quoted text ? – SantiBailors Mar 25 '15 at 11:48
  • 1
    Interesting question! The "No" really refers to both (and because the second question was phrased as "doesn't...?", it technically probably should have been yes). To determine the randomness, we really divide passwords into two classes: real-world-related (such as HelloWorld, as well as He110w0r1d), and non-real-world-related (such as gkwwpBnePU). You can then measure the randomness simply by the size of each class. gkwwpBnePU is more random than HelloWorld not because it is "more distant" but because the it is part of a class that has vastly more members. – Kevin Keane Apr 01 '15 at 05:31
  • +1, nicely put with the distinction of the two classes. – SantiBailors Apr 01 '15 at 09:55
0

What does “random” mean in the context of password creation?

Creating a string based on variables that (can) change (aka random).

A password can be "built" from many things: words, numbers, letters on their own, a combination of the aforementioned. A password could also be built from variables, such as the temperature of your GPU/CPU, amount of RAM, internet browser version, etc.

That being said, the password still has to be encrypted. The encryption method does not care whether your phrase is merely "hello" or a phrase based on truly random variables. It will simply encrypt the phrase with the same method, though the output should always be different.

Once the password is stored, in i.e. a database, and retrieved by a hacker, an attacker can start bruteforcing the password. Even though you have used an incredibly extraordinary method to generate a random password, it's still going to be decrypted at some point.