There is no good way. What you say, is practically the measurement of the password distance in our mind. It is clearly impossible to have a direct method to do that.
Second thing, what you want to measure, depends heavily on the person, and contains often only for him known informations. For example, one of your collegues could use the name of its childs on the different company servers. It is not possible to build a software solution to find this, but some hacker / collegaues can have this information and use them to crack his account.
What you can do, were a step into the track of the NSA: although you can't spy peoples mind directly, you can use Big Data to emulate some very similar.
What you need: publicly available informations on the net. For example:
- Thesaurus
- Wikipedia (although there is no simple way to measure the link distance of two keywords, its database is simply downloadable and you can build a script to analyze its link connectivity).
- Or simply you could do automatized google searches with the google search api, and get ratio of hits between the first, between the second password and between a dual query (for example, if the first password is "apple" and the second is "orange", then the
Hits("apple")*Hits("orange")/Hits("apple", "orange")^2
had to be below an experimental limit set by you).
But beware: don't execute queries containing the passwords into an untrusted public cloud, it were a very serious security breach! Of course, it depends only on your viewpoints/considerations/responsibility, which public cloud is trusted for you. For me, none were.
In your place I did the following:
- I get a wikipedia mirror (they have simple mysql database which is publicly downloadable)
- Created a link distance map (it were very simple, although it were maybe big)
- I created for the two passwords to compare to the their nearest wikipedia article title (it needed probably a massive levenshtein comparation, so you will need a lot of cpu)
- Finally I used the following formula: D("pwd1", "pwd2") = Levensheiten("pwd1", Lev_nearest("pwd1")) + Wiki_Link_Distance(Lev_nearest("pwd1"), Lev_nearest("pwd2"))+ Levensheiten("pwd2", Lev_nearest("pwd2"))
Extension: wiki contains around of some 1million of text entries, which makes the shortest way search nearly impossible. You had surely implement this as a C++, and use very well optimized algorithms. Thus, it will be hard. As an alternative, you can do that you use from the wikipedia only the most common words (which can be found by getting their usage stats). Although the english wiki has around some million articles, a native english speaker knows only around some ten thousands from them.
Somebody should really write this, it were a wonderful opensource demon somewhere in the github :-)