30

I often see passphrase suggestions written as a sentence, with spaces. In that format are they more susceptible to a dictionary attack because each word is on it's own as opposed to a large unbroken 20+ character 'blob'?

v15
  • 1,751
  • 4
  • 16
  • 18

6 Answers6

25

No (with a minor exception at the bottom).

The passphrases "correct horse battery staple" and "correcthorsebatterystaple" are equivalent entropy-wise. Choosing to put spaces in an incorrect spot or sometimes including spaces and sometimes not including spaces will give you a few extra bits of entropy; but its not worth it for the extra difficulty remembering it. You'd gain a few bits of entropy for the entire passphrase for weird spacing pattern; while just adding another word would add about 13 bits (assuming a diceware dictionary of 7776 words corresponding to 5 rolls of a six-sided dice; note that lg(65) = 12.92; lg being the base-2 logarithm). (There's no disagreement between my answer and Thomas's; an attacker would have to check for passphrases both with and without spaces unless he had extra information about whether you tended to use spaces in your passphrases).

Beware the distinction between random words and meaningful sentences. A passphrase "quantum mechanics is strange" is much lower entropy than say "heat fudge scott canopy"? Why? In meaningful English you have patterns like certain words combine frequently (quantum mechanics) or certain patterns must appear to be grammatically correct (subject, predicate, subject complement) that in principle could be exploited by a sufficiently sophisticated attacker (even though I am not aware of any cracking algorithms that currently utilize this). The informational entropy of grammatically correct written English is about 1 bit per character so the first passphrase has ~30 bits [1], while the second passphrase has about 4×12.9 ≅ 52 bits of entropy; so would take about 222≅4 million times longer to crack.

Be wary of incorrect analyses like http://www.baekdal.com/insights/password-security-usability that make many fundamental information theory mistakes. E.g., "this is fun" is incredibly weak being comprised of some of the most common English words in a syntatically correct sentence that is very common ('this' ~ 23rd most common; 'is' ~ 7th most common; 'fun' ~ 856th most common word) [2]. If you tested just three random words from the top 1000 english words, it would take you only 1 second to crack it, assuming a modern GPU and you have acquired the (salted) hash. This is roughly equivalent to a 5 random alphanumeric characters (not counting special symbols). If you search google for the quoted phrase "this is fun" it appears 228 million times.

EDIT: Minor exception: in the rare case when consecutive words in your passphrase form another word in your dictionary (or your attacker's dictionary), then not having spaces (or another separator) between words lowers your passphrase's entropy significantly. For example, if the random words forming your passphrase were "book case the rapist" and you had no spaces, an attacker could get in by trying all combinations of just two words 'bookcase therapist'.

dr jimbob
  • 38,936
  • 8
  • 92
  • 162
  • 5
    "in principle could be exploited by a sufficiently sophisticated attacker" is there any actual evidence of this? I always view such claims rather skeptically since there are so many ways to construct a sentence. – Jeff Atwood Jan 20 '12 at 20:01
  • 1
    @JeffAtwood - I'm not aware of these attacks in the field, where one iterates over computer-constructed meaningful English. The possibility is there as the intrinsic randomness of the phrase is (relatively) low and comprable to things that are crackable. I agree this sort of attack would be more difficult to construct than simple brute-forcing a character set or dictionary attack; but it wouldn't surprise me if NSA/others have done research on these line. – dr jimbob Jan 20 '12 at 20:54
  • 2
    @Jeff, DrJ - I believe this loss of entropy is used in text-message autocompletion assistance in many/most cellphones. – Ed Staub Jan 20 '12 at 20:59
  • 4
    +1 for the almost xkcd reference. http://xkcd.com/936/ – fire.eagle Jan 20 '12 at 20:59
  • @EdStaub, Jeff - google auto-complete can (for my content bubble) figure out "**quan** tum **m** echanics **is** **str** ange" by only typing in the bold letters, while can only fill in the ge of "**heat** **fud** ge **scott canopy**" in the diceware passphrase, so its at least computationally/algorithmically feasible to reduce it's complexity down significantly (to ~10 chars in patterns that can start words). I doubt most password cracking algorithms would reach a bizarre low-entropy password like `666666555554444333221`, but that's security by obscurity vs utilizing true randomness. – dr jimbob Jan 20 '12 at 21:19
  • This is incorrect. Spaces do add entropy to any real password/pass phrase system. And your parenthesized comment admits it. – President James K. Polk Jan 21 '12 at 21:03
  • Only in the somewhat rare circumstances where the separation between words is not clear and then only when it gives an possibility that solves it with one less word. If you just have two different possibilities for the same passphrase; you've barely reduced the entropy. E.g., if you had a four word passphrase and two of the words could be interpreted two ways; like `these men` or `the semen` without spaces, yes you've cut down on one possibility; reducing it from lg(6^20)=51.7 bits to lg(6^20-1) (reducing the entropy by 3.9x10^-16 bits). – dr jimbob Jan 21 '12 at 22:10
  • If you used the EFF diceware word list, then adding spaces don't add any entropy, because EFF wordlist is a prefix free encoding. The original diceware passphrase isn't prefix free, and requires a separator to maintain its calculated strength. If you used EFF wordlist, then you should skip the separator. It's much easier to type that way, and it also avoids an attacker using a sound recording of you typing your password to use the distinctive spacebar sound to reduce the size of their attack search space. – Lie Ryan Dec 03 '19 at 07:35
18

Spaces in a passphrase add entropy exactly insofar that they could not have been added. An important point is that an attacker cannot test for a partial match on a password; contrary to what Hollywood movies tend to suggest (in a most graphic way), there is no such thing as a "partial decryption" (where the text is partly legible, but blurred) or a "partial password". The attacker has the exact expected password, down to the last comma, or nothing at all. This is a login system, not a game of Mastermind.

For instance, suppose that you make passwords by randomly selecting four words in a list of 2048 "common" words, and appending them (the "correct horse battery staple" method). We assume that any attacker knows that you are selecting passwords that way (e.g. that's the "official password selection method" promulgated by the sysadmin). How much entropy is there in such a password ? That's easy to compute (assuming you are really selecting things "randomly", with dice, not with your brain): there are 2048*2048*2048*2048 = 244 possible passwords, which all have the same probability of being selected. Hence, 44 bits of entropy.

Now, suppose that the selection process also states: "You shall concatenate the four words without any space". There are 244 possible passwords, so 44 bits of entropy. Assume now that the rules say: "You shall always put a single space between two words". There still are 244 possible passwords, so still 44 bits of entropy. But suppose that the rules say: "you shall either separate the words with spaces, or concatenate them all together (throw a coin to decide one way or another when you choose your password)", then there suddenly are 245 possible passwords (still with equal probability): entropy is now 45 bits.

Even more generically, if the password selection process entails throwing a coin three times, to decide for each slot between two words if there should be a space or not, then entropy rises to 48 bits. But note that this is not "free": you get more entropy, but you have to remember more, too (namely where you put the spaces).


On a practical note: on a typical keyboard, the space bar, when pressed, emits a slightly different sound. If your office colleague has a keen ear, he may notice whether you use it or not, and possibly at what places. Also, your colleagues perfectly knows the password selection rules which are advertised in your organization, since he is, by definition, in the same organization than you. So I would advise against using spaces as source of entropy. Especially if you use the "four words rule" and not all words in the list have the same length: the long-eared colleague may deduce the length of each word by hearing the spaces when they are typed.

Thomas Pornin
  • 322,884
  • 58
  • 787
  • 955
  • not to defend hollywood, but...timing attacks. – jmoreno Jan 21 '12 at 11:00
  • @jmoreno - timing attacks wouldn't work on a hashed passphrase; or even comparing a plaintext passphrase in a secure manner (no quicker response when wrong). – dr jimbob Jan 21 '12 at 21:53
5

The simple answer is yes, but not very much. Think about the character space - if you are looking at alphanumeric including upper and lower case that gets you 62 chars (a-z, A-Z, 0-9). Adding {space} means 63 chars so you have improved by 1/62

Contrast that to adding an extra character which increases your entropy exponentially.

Iszi
  • 27,027
  • 18
  • 99
  • 163
Rory Alsop
  • 61,474
  • 12
  • 117
  • 321
  • You answered for a password that randomly samples characters from a 62-charset (my answer assumed spaces in a passphrase like `let me in` vs `letmein`). The sample space of a random N-character password with spaces would be `(63/62)^N ~ 1 + N/62` times more secure than one without spaces; e.g., a 8 character password would be about 1.14 times more secure if spaces were allowed. Adding an an extra character would make the password `(62^(N+1))/(62^N) = 62` times more secure. – dr jimbob Jan 20 '12 at 23:34
  • @drjimbob: When `let me in` becomes a possibility as well as `letmein` entropy has increased by definition. – President James K. Polk Jan 21 '12 at 21:05
  • @GregS - Sure. If you forbid a user to use spaces in their passphrase, you've cut down the attackers checking time. But if user A has passphrase `correct horse` and user B has `correcthorse`, you can't say user A or B's passphrase is stronger (while user C with `correct horse battery staple`) is higher entropy due to the extra two words. – dr jimbob Jan 21 '12 at 21:58
  • well, to be fair - correct horse is slightly stronger than correcthorse, but only slightly. – Rory Alsop Jan 22 '12 at 11:02
1

Adding spaces may or not add security, but adding spaces make it easy to read and type pass phrases. For example, Lastpass can generate OTP (one time passwords) if you need them while traveling and using a public computer. These OTP passwords look like this:

da6ed4bff36860012a563dd3560c289b

You are supposed to print them and type them when you need to login from a public computer, but these are extremely difficult to read and type. Instead of this, their OTP should have looked like this:

raise erase lifeless wipe determine quit cause forest them wife

words with spaces that is easy to read and type. There are no cons to including spaces in pass phrases, but the pros are obvious, as in this OTP case.

Jens Erat
  • 23,816
  • 12
  • 75
  • 96
user12480
  • 195
  • 1
  • 6
1

You can't use a dictionary attack, because you can't try each word individually. This is a pass phrase. You have to get it all to get it right. I'm sure your dictionary might contain "mike", but it doesn't contain "mike is the smartest man alive". Those are two different passwords. Adding spaces adds more characters. The more characters the more combinations the more entropy.

k to the z
  • 1,125
  • 1
  • 12
  • 25
  • This is misguided. If you say have a (salted) sha1 hash of 'mike is the smartest man alive' being written meaningful english has an information entropy of about 30 bits (e.g., about 2**30 possibilities) and so a single modern GPU with a well written routine should be able to break it in about 1 second. – dr jimbob Jan 20 '12 at 16:17
  • His question is "does adding spaces add entropy". – k to the z Jan 20 '12 at 16:24
  • I realize what you were saying now. Could you direct me to an example of a well written routine that assumes common English uses when guessing passwords? – k to the z Jan 20 '12 at 17:10
  • "The entropy rate of English text is between 1.0 and 1.5 bits per letter, or as low as 0.6 to 1.3 bits per letter, according to estimates by Shannon based on human experiments." [1](http://en.wikipedia.org/wiki/Entropy_(information_theory)). You have an extra trillion in your math; your rate corresponds to sample size of 1.4 x 10^51. Even the most naive analysis of the password 30 random characters from 27 char set is only 27^30 ~ 9x10^42 ~ 142 bits. But that's an absurd way to compare -- the letters weren't random. – dr jimbob Jan 20 '12 at 17:21
  • The letters weren't chosen at random; they are full English words - all of which appear in standard diceware dictionaries. So even if the words were all randomly chosen you'd only have an entropy of ~78 bits; which according to diceware faq should be within the range of large organizations to break by 2014. But that was for six random words would like like 'greta tort cocky sewn cult river', which is much higher entropy than a meaningful English phrase like 'mike is the smartest man alive' (that can be found on google). – dr jimbob Jan 20 '12 at 17:22
  • I'm not aware of any natural language generation algos for cracking passphrases. However, theoretically its do-able and it wouldn't surprise me if secret gov't agencies (NSA, Chinese) have or work on algorithms. Even just cull popular sentences from sources of English text/google and try common variations off of them (try all common names in sentences of the category "is the smartest man in the world"). I'd rather my security comes from information theoretic properties, as opposed to not personally being aware of known attack routines. – dr jimbob Jan 20 '12 at 17:43
  • Thank you for your information. Any particular reading material you suggest? – k to the z Jan 20 '12 at 18:19
  • For general introduction to information theory; see http://www.inference.phy.cam.ac.uk/mackay/itila/book.html ; also I apologize if I was a little harsh originally. – dr jimbob Jan 20 '12 at 18:40
  • @drjimbob Ultimately, doesn't this all still descend from a _general_ understanding of attack routines - otherwise, every password has entropy of 0 as the sole entry in a dictionary consisting only of that password (or, alternately, an all-lowercase password is just a particular instance of an alphanumeric password). Also, to reach 2048^4, you can't reject it if the four words form a common english phrase, or even if they happen to be "correct horse battery staple". - so the entropy of a given password method is limited by the set of weaker attack methods that can find some of the passwords. – Random832 Jan 21 '12 at 01:57
  • @Random832 - I agree. The entropy of your passphrase should be determined by the simplest method of generation; not necessarily the one you used. In rare cases where a generator (say 8 random characters) gives something like `PasswOrd`, it is in your interest to reject it. By rejecting passwords you estimate could be generated with under 35 bits of entropy (should happen ~1 in 1552 passwords), you do reduce the entropy slightly; e.g., from lg(52^8) ~ 45.6035 to lg(52^8 - 2^35) ~ 45.6026. (And note `PasswOrd` probably has only about 5-10 bits of entropy). – dr jimbob Jan 21 '12 at 16:20
1

It depends. Of course it matters what the circumstances are in detail.

If the attacker does not know, that you're combining words, and does a brute force attack over the whole space of possibilities, the blank increases the alphabet.

If the attacker knows or assumes, that you're using a sentence or a list of words, but does not know, whether you use a blank or not, the number of sentences with or without blank is nearly twice as big, as only without blanks. bookcase book case as mentioned by dr jimbob being the exception of course.

Twice sound much, but if you think about it - an attacker which does a brute force attack over 1 month will probably do it for 2 months too.

The question of sound is an interesting thought. I have another one:

If you insert your password in a Textfield of a GUI, maybe the browser, a blank should always work. But if you happen to insert it from a shell, in an indirect process:

  foologin -u JoeDoe -p you will never guess it

might not work, because shells tend to use blanks as separators. You would need to use

  foologin -u JoeDoe -p "you will never guess it"

My idea is now, that the error might help an attacker, to guess the reason for the error, and therefore to leak information of the way, the password is build. There might be a rule to use at least 8 characters, and "you" happens to have only 3, so people, using sentences without masking, might be typical victims of this error message.

Of course this is very speculative.

Using blanks may, on the other side, help you to memorize a very long pass phrase. You may, as a French user on an US site use a Spain sentence, insert foreign names like Kolmogorov and spelling errors you remember easily, which will be hard to plan for in brute force attackss. :) Of course, you can do that without blanks too.

user unknown
  • 494
  • 5
  • 11