Possible Duplicate:
Do non-keyboard characters make my password less susceptible to brute forcing?
Every article on password security that I read tells people to make the password more complicated by using a wider range of characters. They say don't only use a-z but also mix in some A-Z, numbers 0-9 and some punctuation. Basically use all of the characters on your keyboard. However, I am building websites designed for a non-English speaking audience. Specifically Chinese users. I have noticed that many Chinese websites also ask for passwords to be made using the same set of characters. I am left with a puzzle as to why to limit to only the core ascii set. Why not use Chinese characters or an other script's characters?
For example, instead of using "!0*%y6#!7N@6" the user could use "胜0屿%y6#!7N景6", which is the same length but significantly more complex.
My application is built in UTF-8 and is compatible with Chinese and other complex scripts. So there is no programming problem for me to allow complex characters in passwords.
By extending the possible character set of passwords to include Chinese, Japanese, Korean, Arabic characters, I can increase entropy of the passwords to incredibly high levels without making the password longer or more difficult to remember. In fact it may be easer of my Chinese users to remember a Chinese password than a English language one. It would be very unlikely that someone could brute-force or use a rainbow table to crack the password.
I can understand the Character set limits on western users where the Characters used are all the ones on a keyboard and it is quite awkward to enter a Character that is not on your keyboard. However, Chinese users have to tools on their system to enter the full Chinese character set so there is no problem for them there.
So to put the question in short. Is there any security issue in allowing users to make passwords from characters beyond the normal keyboard set?
To expand and answer AviD's point below:
When a password is entered, it doesn't remain as characters but is instead converted into a sequence of bits. These bits are the real password. The process of converting characters to bits is character encoding. ASCII is one such encoding, though now rather old and limited in size. Another common one is Unicode which has evolved in the UTF-8 encoding that most websites are recommended to use today.
Unicode and UTF-8 are backwards compatible with ASCII so any ASCII based password would be the same in bits no matter what encoding was used when the password was entered. However there are some forms of encoding that are still popular that are not compatible with ascii many that are not compatible with unicode or UTF-8. These include encoding systems such as Big5 (used in Tiawan and Hong Kong) and GB used in mainland China.
If someone entered their password into a computer one day in one encoding and other day in another encoding then the sequence of bits send as the password would be different.
It is possible to detect the encoding system and convert at the server side. My applications already do that, converting everything that is entered into UTF-8. However, I wonder how perfect that conversion is. Would Big5 converted to UTF-8 give the same result as GB converted to UTF-8?
Additionally, there are some character encoding based XSS attacks that use sloppy character encoding and handling as their vector. Could a similar thing be use to compromise user passwords or my application where no, or little, limits are place on what characters can be imputed?