-1

I have been thinking about how to generate random passphrases from a public dictionary of words (similar to XKCD/Diceware passphrases).

One thing in particular I was thinking about is that the length of such a passphrase will leak a lot of information about the phrase (assuming the dictionary contains words of varying length, like most diceware lists seem to do). Say I have a list of 1024 words then generating a random five word phrase should provide 50 bits of entropy if the length is hidden. However, say the length is not hidden and there is only 128 words of length three (and none of length one or two) in the dictionary. Now say we know a passphrase is 15 characters. Then a five word passphrase of length 15 could only be produced from those 128 words of length three giving a much lower entropy of 35 bits.

I am wondering if this loss of entropy is something I should worry about.

Particularly I am interested in whether or not is it fair to assume that someone breaking a passphrase does not know the length of the phrase? Put an other way is it reasonable to assume that in most common systems the length of the phrase is hidden to a potential attacker?

If not then does passphrase generators take this in to account somehow?

I should add that I ask because I am not so familiar with how passwords/phrases are protected. However, I assume they are often sent to a server in some encrypted form, and as far as I am aware encryption does not necessarily protect the length of the plaintext.

This is not similar to questions about revealing password length. This is because each character in a password is of the same length (namely 1). In a passphrase, however, the equivalent of a character is a word from the dictionary. Assuming these words have different lengths, the length of the entire passphrase will reveal what types of words where used. In the example above a password of length 15 reveals that only words of length 3 where used. For a password this is equivalent to to something like revealing that only the letters a, b, c, d, e, f, g, h, i and j where used in generating the password.

I also read the question about the security of XKCD style passwords, but as far as I can see none of the answers deal with this issue.

Guut Boy
  • 99
  • 3

2 Answers2

2

Questions about password length and entropy and stuff are well asked on this site, so if you hunt around a bit and you'll probably find the answer you're looking for.

There is one thing that I want want to address directly from your question though. You said:

Say I have a list of 1024 words then generating a random five word phrase should provide 50 bits of entropy if the length is hidden . . .

Unless I'm misreading it, the whole premise of this paragraph is that an attacker who learns the length of your password also knows 1) that you're generating your passwords from a list of words, and 2) has access to that list. If this is what you're getting at, then we need to ask some questions like: "how did they get the list?", "do they have access to anything other than the list?", "do they know enough about you to craft personalized phishing emails?", etc. As you can see, as soon as we start assuming that the attacker knows additional information, the entropy (or strength) or your password stops being the most important point.

I gave an answer to a similar question in which I built up an argument that

Once an attacker is spending effort to learn things about you, the whole idea of password strength / entropy no longer makes sense.

Mike Ounsworth
  • 58,107
  • 21
  • 154
  • 209
  • 2
    Well, the assumption that the word list is public is really just to make as few assumptions as possible. Which seems to be a sound security principle. For example, as far as I know, diceware passphrases are usually generated from a public wordlist downloaded from the internet. – Guut Boy Aug 18 '15 at 13:51
  • hmm, I wasn't aware of diceware as a thing, so I don't have a fully-formed opinion about that. Let me see if I can do some math here. – Mike Ounsworth Aug 18 '15 at 14:02
0

A mathy answer directly addressing your question about passphrases which are generated from a known wordlist (ex. the Diceware technique). Since I'm at work I'm gonna take a very rough pass at the math using standard diceware as my model and make a lot of guesses. Someone else can do a more careful job if they want.

Without knowing the length of your passphrase, a randomly dicing 5 word gives you log_2 ((6^5)^5) =~ 64 bits of entropy.

The first thing we need to know is how many words there are of each length in the English Diceware List. I could write a script to count them, but I'm lazy so let's say that the 7776 words in the list are evenly distributed over lengths [2,10], giving 864 words of each length.

Assuming you follow the best-practice for diceware of putting spaces between your words, the shortest passphrase you could have is length 14: 5xlen(2) + 4 spaces. This gives you log_2 ((864)^5) =~ 49 bits of entropy. You've lost some, but that's still quite a bit.

If your passphrase is length 15 then it has to be 1xlen(3) + 4xlen(2) + 4 spaces, giving 'log_2 (864*864^4) =~ 49 bits of entropy` (I'm leaving it expanded so that if someone actually counts the number of words of each length, they can plug in numbers).

Length 17 is the first that has a choice cause it could be 1xlen(5) + 4xlen(2) + 4 spaces, or 1xlen(4) + 1xlen(3) + 3xlen(2) + 4 spaces, giving log_2 (864*864^4 + 864*864*864^3) =~ 50 bits of entropy.


If my math is even remotely right, then even in the worst case where you only have words of length 2 in your passphrase, you still have a strong 48 bits of entropy. (Assuming that the attacker is trying to reverse-dice your passphrase, note that many of the passphrases generated by the diceware technique may have significantly less entropy against a standard dictionary / rainbow table, since they are all common words.)

I would be very upset if this got accepted as an answer because, while it answers the question as asked, I think it's a very naive answer which overlooks some important security questions like: "how did the attacker learn that you're using diceware?", "how did the attacker learn that you're using exactly 5 words?", "if the attacker is willing to throw a small server farm at breaking your passphrase, are you sure this is the weakest link in your security?", etc.

Mike Ounsworth
  • 58,107
  • 21
  • 154
  • 209
  • 1
    Thanks for you answer. However, this question is of a general nature. I.e., not specific to diceware, but about any scheme using a public list. I am well aware, as my example shows, that entropy is reduced by revealing the length. So what is really interesting is whether it is reasonable to assume that the length is hidden to an attacker or not? – Guut Boy Aug 18 '15 at 15:09
  • 2
    How many questions are you asking in this single question? **Now** you want to include `whether it is reasonable to assume that the length is hidden to an attacker or not?` ? For that you should definitely open a new question. Also, see this answer: http://security.stackexchange.com/a/92240/61443 – Mike Ounsworth Aug 18 '15 at 15:14
  • Btw, if you look at the diceware list (the first appearing on google) there are only 52 words of length one. I.e., if the length of your passphrase is revealed to be 9 (= 5*1 + 4 spaces) you only have about 29 bits of entropy! – Guut Boy Aug 18 '15 at 15:17
  • Sorry I thought it was clear that was what I was asking. I will edit the question to make it more clear. – Guut Boy Aug 18 '15 at 15:18
  • 1
    Editing your question is sorta irrelevant at this point cause the question has been closed, nobody can post new answers. You should open a new question (or maybe 3 or 4) with a single focused question. – Mike Ounsworth Aug 18 '15 at 15:21
  • Sorry! I am very embarrassed I missed answer in the above link. It is exactly what I was looking for. Thank you very much! – Guut Boy Aug 18 '15 at 15:31
  • Bah, don't worry about it. I also rush straight to posting whet I'm excited about something. Passwords, lengths, and entropies have been beaten to death on this site, so people are quick to close-as-dup when a new one comes up. – Mike Ounsworth Aug 18 '15 at 15:36