Is there a hash algorithm that will help you identify similar files or strings? For example, the hash for ABC and XBC would be similar rather than radically different as is usually the case. I know of one measure of similarity, Edit Distance (http://en.wikipedia.org/wiki/Edit_distance). But this doesn't give you a hash for each input to compare, only a score between any two inputs.
Update
The comment by Andan (locality sensitive hashing, LSH) is what I was looking for. My motive for asking the question is I was wondering how LSH might be used in scanning for malware. Is it used for identifying malware? Why or why not?
Update
In line with Tom Leek's, answer I did some investigation of my own. I wrote a program that would XOR the bytes of a file with a predetermined "random" pattern (the seed didn't change). Then it would sum the total 1 bits. This would produce the Hamming distance from the random pattern to the file. Really, it wasn't a very useful metric as it basically (on average) was just halving the file size to come up with a number.
Some examples:
Two related executables I scanned scored 2684964 and 2738772 for a difference of 53808. They are definitely related (different versions of programs I wrote) but the value of 53k is close to half of the file size difference in bits: ~128k. So it's not a useful metric for determining similarity.
I scanned two similarly sized JPEGs that were definitely different images. They scanned as 3124915 and 3110981 for a difference of 13934. So their difference was "smaller" than the difference between the related executable, even though they aren't related. So it's not a useful metric for determining difference either.
Conclusion:
As Tom Leek, said, it's an open problem for a reason.