[Disclosure: I work for AgileBits, the makers of 1Password]
One of the reasons why I advocated for an XKCD-like scheme (before it got called that) in Toward Better Master Passwords back in 2011 is precisely because its strength does not rely on the attacker knowing what scheme you used. If I may quote myself
The great thing about Diceware is that we know exactly how secure it
is even assuming that the attacker knows the system used. The security
comes from the genuine randomness of rolling the dice. Using four or
five words should be sufficient against the plausible attacks over the
next few years given observed speed of password crackers [against
1Password Master Password]
What the XKCD comic does not effectively communicate is that the selection of words must be (uniformly) random. If you ask humans to pick words at random, you get a heavy bias for concrete nouns. Such biases can and will be exploited.
How much strength you want
In a perfect world we would want to strength of our password to be as strong as the keys we are protecting with it. Say 128 bits. But despite these techniques, humans aren't going to achieve that. So let's look realistically at attacks and what we can have our puny little brains do.
With the original Diceware word list of 7776 entries, you get approximately 12.9 bits per word that you use. So if you want at least 64 bits for your password, then five words will do it.
Guessing passwords is slower than guessing keys
In this section I arrive at a very rough back of the envelope estimate that for a constant amount of dollars it is 2^13 times slower to test a password than it is to test an AES key.
Note that testing a password is a lot slower than testing a key. If the right sorts of password hashing schemes are used, it is possible to keep most attackers down to under 100000 guesses per second. So while we might never want to use 50 bit keys, using 50 bit passwords might still make sense.
If we aren't going to limit ourselves to rolling dice as in Arnold Reinhold's original Diceware scheme from 1995, then we can use a longer list of words. The Strong Password Generator in 1Password for Windows uses a list of 17679 English words between 4 and 8 letters inclusive (stripped of taboo words and words that involve an apostrophe or hyphens). This gives about 14 bits per word. So four of these gives you 56 bits, five gives you 70.
Again, you do need to pay attention to cracking speeds. Deep Crack back in 1997 was able to run 92 billion DES tests per second. Assuming that a high end specialized PC can perform one million guesses per second against a reasonably well hashed password could do 1 million guesses per second, then passwords today are about 16 bits harder to crack than DES keys were in 1997.
So let's look at this Stack Exchange estimate for a dual core 3.8GHz processor: 670 million keys per second. If we were to assume $5000 in hardware, we can easily exceed 10 billion keys per second. So at a similar hardware cost, key cracking is still more than 2^13 times faster than password cracking.
Revised password strength goals
Working on my estimate that it is 2^13 times more expensive to test a well-hashed password than it is to test an AES key, we should consider a reasonably well hashed password as being 13 bits stronger than its actual entropy with respect to cracking. If we want to achieve 90 bits of "effective strength" then 77 bits of password strength should do it. That is achieved with a six word Diceware password (77.5-bits) from the original list and 84.6 bits with six words drawn from a list of 17679 words.
I don't expect most people to use passwords that long. I expect people will use things that are 4 or 5 words long. but if you are genuinely worried about the NSA going after your passwords, then six words should be sufficient assuming that you use a decent password hashing scheme.
Very rough estimates only
I didn't spend a lot of time researching costs and benchmarks. There are lots of things in my estimates to quibble with. I attempted to be conservative (pessimistic about the scheme I'm advocating). I've been vague about "well-hashed passwords" as well. Again, I'm being very conservative with respect to the password hashing in 1Password. (For our new data format, attackers have been kept to under 20,000 guesses per second and for our older data format they've reached 300,000 guesses per second for multi-GPU machines. In my estimates here, I've picked 1 million guesses per second for a "reasonably well-hashed password".)
A few more historical notes
The overall idea for "XKCD-like" passwords goes at least as far back as the S/Key one time passwords from the early 1980s. These used a list of 2048 one through four letter words. A six word S/Key password got you 66 bits. I don't know if this idea of using randomly selected words from a list for a passphrase predates S/Key.
In 1995, Arnold Reinhold proposed Diceware. I don't know whether he was aware of S/Key at the time. Diceware was proposed in the context of developing pass phrases for PGP. It was also before most computers had cryptographically appropriate random number generators. So it actually involves rolling dice. (Although I trust the CSPRNGs on the machines that I use, I still enjoy "rolling up a new password").
In June 2011, I revived interest in Diceware in Toward Better Master Passwords with some additional modification. This resulted in my 15 minutes of fame. After the XKCD comic came out, I produced a geek edition that walked through some of the math.
In July 2011, Randall Monroe had picked up on Diceware-like schemes and published his now famous comic. As I am not the inventor of the idea, I don't at all mind being upstaged by the comic. Indeed, as I said in my follow-up article
What took me nearly 2000 words to say in non-technical terms, Randall
Monroe was able to sum up in a comic. This just shows the power of math ...
But there is one thing about how the comic has been interpreted that does worry me. It is clear to me and people who already understood the scheme that the words must be chosen through a reliably uniform random process. Picking words "at random" out of your head is not a reliably uniform process.