2

I was thinking of a way to generate a password that is easy to remember, but hard to crack, like the famous "correct horse battery staple" suggested by XKCD, also discussed here, and I've realized I've never heard the suggestion to combine words from different languages. Other questions on this site deal with non-English passwords or dictionaries, but they don't consider mixing languages, which is what I am focusing on.

Let's make an example. My native language is Italian. If I translated the last two words and came up with "correct horse batteria graffetta", wouldn't it be a lot harder to crack than the English sentence?

Lots of people have at least basic knowledge of a second language, so it should be easy for them to choose a foreign word. Granted, if their level is low, they are likely to choose very simple words (like "hello", "cat", "dog") which would be easy to guess, and this would be bad. But excluding this case, that is, assuming the chosen words are not completely trivial, would this suggestion work? Would it provide stronger passwords, all else being equal (that is: the total length, the presence of lowercase/uppercase letters, numbers, and other symbols, the entropy...)?

In my opinion the effectiveness of this strategy depends on the attack. If it is purely brute-force (i.e. no dictionary), I'd say it's the same. But if the attack is dictionary-based (as is likely the case), wouldn't this technique thwart many attacks? How many dictionaries are there that combine words from more than one language? I think my example phrase "correct horse batteria graffetta" would be quite hard to guess.

Let's take this approach even further and add one more language. "Horse" in Italian is "cavallo", and in German it's "Pferd" (with capital P, but let's ignore this and keep everything lowercase). Wouldn't "horse cavallo pferd" be even harder? And this is just one word translated in 3 languages, which is probably a very bad idea. Let's take 3 unrelated words: window, shark, apricot. What about "window squalo aprikose"? I think this would be really hard to guess.

Of course, this is "security through obscurity", that is, this method works well if the attacker doesn't know that it is being used. Let's assume the worst case: he is aware of this, and he even knows what languages I speak. Clearly my idea would be mostly ineffective in this case, but still: wouldn't it be better than choosing words from a single language? The attacker would be forced to use a larger dictionary, and his attacks would take longer.

To conclude: I think creating a password by combining words from different languages can greatly reduce the effectiveness of a non-targeted dictionary attack (and possibly thwart it completely). If, instead, the attack is targeted, the benefit is smaller, but on the other hand there is no drawback. Therefore, this method can be seen as an improvement over XKCD's suggestion.

Am I right? Is it a good idea to adopt/recommend this technique, or am I missing something?

  • You're correct that the effectiveness depends on the attack. So by using the same character set you aren't improving the time for a general brute force attack. Now if you were using words that contained special characters... ęķß... – Ramrod Feb 13 '16 at 04:18
  • Probably not a good idea to use this technique after publishing it with your name. – Neil Smithline Feb 13 '16 at 05:43
  • Not a bad idea...but also not a *new* idea either. AgileBits and probably others already suggested that, and even prepared word lists. See https://blog.agilebits.com/2013/04/16/1password-hashcat-strong-master-passwords/ in the "Beyond diceware" section, about halfway down the page. – Ben Feb 13 '16 at 23:44
  • @NeilSmithline The idea would be to spread this so that everyone does it, just as XKCD did. And if attackers started to take this into account... Well, it would force them to expand their dictionary, using more languages, thus slowing them down. Still a victory. – Fabio says Reinstate Monica Feb 14 '16 at 14:51

2 Answers2

6

All that matters when calculating the strength of password generation strategy is how much entropy is involved in the generation of the password.

Using multiple dictionaries does not weaken the pass phrase. How much this strengthen the pass phrase through, depends on how exactly you choose your pass phrase.

From the perspective of calculating entropy, using a combined dictionary from multiple languages is exactly the same as increasing the number of words in your dictionary.

For example, standard diceware strategy used a dictionary with 7776 words. The strength of a 4 words diceware password is 7776^4 ≈ 2^51.7, or 51-bit. If you modified this strategy so you use a combined dictionary of 7776 English words and 7776 a Italian words, then the strength of your passphrase would be (7776+7776)^4 ≈ 2^55.7, or 55-bit, which is just 4-bit stronger than the standard diceware.

If on the other hand you decided that the first two words is going to be English and the last two words is going to be Italian, or if you decide you're going to use a combined dictionary of 3888 English words and 3888 Italian words, then these do not actually increase the strength of the passphrase from entropy perspective.

Generally, it's more fruitful to increase the number of words on the passphrase rather than increase the size if the dictionary. If you used 5 words standard dice ware passphrase, the strength of your password is 7666^5 ≈ 2^64.6, which is a nearly 13-bit increase from 4-words diceware. To get a 13-bit increase in strength by increasing the dictionary size from 7776, you need to use a dictionary that is 9 times the size of standard diceware dictionary (9*7776)^4 ≈ 2^64.4 (i.e. You need to use 9 languages).

It wouldn't hurt to increase the size of the dictionary, but it doesn't really add much entropy. From entropy perspective though, passphrase length matters more than the size of the dictionary.

Lie Ryan
  • 31,279
  • 6
  • 69
  • 93
  • It should be noted that this answer only applies to bruteforce attacks. If an English speaker was using English dictionaries for rules based attacks, they would not uncover the password. – cremefraiche Feb 13 '16 at 10:14
  • 1
    @cremefraiche: to be precise, my answer assumes the most sophisticated type of attacker, i.e. one that knows exactly how you choose your password ([Kerchchoff's Principle](https://en.m.wikipedia.org/wiki/Kerckhoffs%27s_principle)). If you design your password to be able to resist an attacker that knows how you choose your password (maybe they watched you posting this question here), you wouldn't have to worry about the less sophisticated attackers. – Lie Ryan Feb 13 '16 at 11:26
-1

It is an interesting observation. I have seen the web page that you have linked. According to the webcomic, the phrase "correct horse battery staples" has 44 bits of entropy. However, if you replace horse by cavallo you will have almost the same entropy because you have added only 2 characters. Of course you are correct, you are clearly increasing the security of the password, but not significantly. Commonly, just by putting together two words in one password, you can frustrate a dictionary attack.

Nick C.
  • 109
  • 2
  • You seem to have a misconception on how that entropy was calculated. For the "correct horse battery staple" entropy, character counts are irrelevant. The 11 bits per word is not based on character count at all, it's based on how many words are in a wordlist which each word is chosen from *at random* (e.g. with dice). See [diceware](http://world.std.com/~reinhold/diceware.html). – Ben Mar 02 '16 at 15:45
  • You're right Ben. The Entropy is calculated using the amount of words included in the selected dictionary according to the diceware strategy (by the way, thanks for the link). So if you change one word (in this case and italian), as Lie pointed out, you´re adding only 4 bits of entropy (by using both english and italian dictionaries). – Nick C. May 17 '16 at 13:23