5

I am developing a 16 chars long hash algorithm. I have a second algorithm that generates random 1024 char strings. This algorithm loops and detects if there's a collision between the newly generated hash and any previous one.

How many iterations with no collision do I need before I can claim that my hash algorithm is "almost secure"?

Mike Ounsworth
  • 58,107
  • 21
  • 154
  • 209
secavfr
  • 197
  • 9
  • 36
    The question shows no understanding what a secure hash is, i.e. what properties it should have (aside from collision resistance) and how to do proper cryptoanalysis (not by doing N tries with fixed length input data). There is also no such thing as "almost secure", i.e. it depends on a specific use case (which is unknown) if it is secure enough. Therefore marked as duplicate of [Why shouldn't we roll our own?](https://security.stackexchange.com/questions/18197/why-shouldnt-we-roll-our-own) – Steffen Ullrich Apr 17 '17 at 14:53
  • Thank you Steffen, my main aim is to be able to generate unique 16 chars long string as identifiers for a unique combination of first & last name. – secavfr Apr 17 '17 at 15:03
  • @FlavienB. Are you saying that, given any combination of first and last name, you want to generate a unique, 16-character hash? – Nat Apr 17 '17 at 18:20
  • No @Nat, I want the principle of a classic hash algorithm : from a string, get its hash that would not be in conflict with any other hash. (the classic principle). :) – secavfr Apr 17 '17 at 19:08
  • There is no useful hash algorithm that is collision-*free*. The best we can get is varying degrees of collision-*resistance* unless you have an uncommonly narrow use case to define usefulness. – David Foerster Apr 17 '17 at 20:47

2 Answers2

43

How many iterations with no collision do I need before I can claim that my hash algorithm is "almost secure"?

Never. You can't claim your hash algorithm is secure. It needs to be carefully audited by talented cryptographers / mathematicians who know exactly what they're doing.

You can't expect to have enough computing power / money to bruteforce it long enough, especially if you're alone. It took Google more than a year and more than $1 million to find that famous SHA-1 collision. Do you think you have enough time / money?

Also, bruteforcing alone isn't the best way to crack a hash function. The function can have major flaws which could be very hard to find via bruteforce. That's why you need talented people to audit the algorithm.

Unless all that work has been done, you should really use standard hash functions which are considered secure nowadays.

Tom K.
  • 7,965
  • 3
  • 30
  • 53
Benoit Esnard
  • 13,979
  • 7
  • 65
  • 65
9

I have a second algorithm that generates random 1024 char strings. This algorithm loops and detects if there's a collision between the newly generated hash and any previous one. How many iterations with no collision do I need [...]?

With a hash function you want a uniform distribution. So if you hash random data a bunch of times, you want the results to contain each result the same number of times.

This property of uniform distribution is hard to translate to your question. You want to know the number of results without collision. However, this is hard to measure because each iteration there is a chance of a collision. This chance increases with the number of iterations. But because it is a probability, if you get a collision after 10,000 iterations you don't know if your hash function is faulty or that you got unlucky.

A better way would probably be to generate a lot of hash values and then look at the distribution of 0's and 1's of each bit. In a uniform distribution, you would expect each bit to have 50% chance of being 0 and 50% chance of being 1.

As for the calculating the probability, this article can help to calculate the probability of a collision.

This article gives numbers on the uniformity of commonly used hash functions.

Sjoerd
  • 28,897
  • 12
  • 76
  • 102