2

I have been reading a bit about passphrases strength and weaknesses. I will take this answer as example.

TTT says, among the other points:

Although there are hundreds of thousands of words in the English language, we (probably) only need to try brute-forcing passphrases using the set of the most common words. We'll assume there are 3000 words in that set.

Leading to an entropy (correct?) of

3000^7 = 2.2 * 10^24

This makes the (generally understandable) assumption that the passphrase will be in English.

My question is: removing that assumption, i.e. that each word is from a different language, and that I do not necessarily need to speak that language to remember how to spell one (or two) word(s) [*], how would one compute the entropy of such a passphrase and estimate if the passphrase strength benefits?

[*] : this is to remove the possible drawback of reducing the languages space by knowing the target, i.e.: "I know they speak only English and Spanish, thus the passphrase can only contain those two languages"

Federico
  • 159
  • 8
  • Adding any word which isn't in the top 3000 (for this example) would make it harder to find the passphrase (with some exceptions: I suspect that "boathouse" probably isn't in the top 3000 English words, although "boat" and "house" may well be). – Matthew Jul 31 '18 at 10:51
  • @Matthew I see your point. mine is that by adding languages that 3000 might (I don't have the numbers) increase more rapidly, since an attacker also has to guess the language. – Federico Jul 31 '18 at 10:54
  • yes, the taller the stack you draw cards from, the harder it will be to guess your hand. – dandavis Jul 31 '18 at 15:50

2 Answers2

2

See Kerckhoffs's principle/Shannon's Maxim. It's very difficult to accurately estimate if an attacker will know or guess the word list you use, so to get an accurate lower bound on passphrase entropy it's simply assumed that they know exactly how you created the passphrase.

If we assume they know you're creating a passphrase by choosing words randomly from a list, and they know that list, it's clear that including words from different languages doesn't really matter. What matters is the size of the list and how many words you randomly take from it. A list of 2,000 English words and 2,000 Spanish words will result in passphrases of the same strength as a list of 4,000 English words.

It could be that, in practice, this will make it more difficult for the attacker, but it is hard to say by how much. Using words from multiple languages is one of the more common ideas to "strengthen" passphrases, so it may also be that it doesn't accomplish much of anything. It's more useful to know that "it will take an attacker at least X guesses to have Y% chance of finding my passphrase" than "I think this passphrase is pretty strong".

AndrolGenhald
  • 15,506
  • 5
  • 45
  • 50
0

Passphrases, like passwords, require a strength which matches the thing you are trying to protect. Ideally you calculate the entropy and have some knowledge of the brute-force/guessing attacks possible. For example, in keypass with a decent password based key derivation function, you can use moderately strong passwords. In a web app which uses md5 and has their DB dumped, you will need a lot more entropy.

If you take a 5 random words out of all languages (for example 100 000 words). You will require 100 000 ^ 5 guesses. If you chose 5 words out of 3000 English words you only end up with 3000 ^ 5 guesses.

If you add upper case and l33tspeak in there, the complexity increases further. Troy Hunt has some very good blogs on this topic.

Silver
  • 1,820
  • 12
  • 23