1

For a passphrase that is UTF8 encoded and then SHA256'd, what is the minimum length to achieve a practically collision free result?

The implementation is for a 32 byte cryptocurrency ed25519 seed thus will not be stored anywhere by the implementation.

1 Answers1

1

Of course I cringe at the idea of a simple hashing over a passphrase, but let's assume that it is actually OK in your setup.

The direct answer to your question is simple: practically, there are no collisions in SHA-256, regardless of input length. If you input two distinct sequences of bytes (e.g. two passphrases), you will get two distinct output. Mathematically, we know that collisions MUST exist (SHA-256 input space is larger than the output space), but we found none yet, and the best known methods for finding collisions have average cost 2128, i.e. a lot more than is feasible with existing technology.

Input length and encoding have nothing whatsoever to do with the subject. How you imagined that input length, furthermore minimum input length, or encoding, had an impact on collisions, is a mystery to me -- so I have to assume that you do not worry about SHA-256 collisions. Instead, maybe you worry about the probability of two distinct people coming up, out of (bad) luck, with the same passphrase ? (SHA-256 being there a red herring.)

If that is your actual question, then it depends on the process by which the passphrases are generated, which has only a very distant relationship with the passphrase length. See this answer for details on how to calculate entropy (because an entropy of "n bits" roughly translates to a probability of collision of about t2/n, as per the birthday problem).

Thomas Pornin
  • 322,884
  • 58
  • 787
  • 955
  • For theory and practice of password hashing, read [this](http://security.stackexchange.com/questions/211/how-to-securely-hash-passwords/31846#31846). Now. – Thomas Pornin Apr 09 '14 at 20:00