2

What is the best method?

Assumption: I have a function that generates a number of medium-high entropy bytes

  • Step1: I generate 3 of these medium-high entropy bytes.

  • Step2: I hash these bytes using a known crypto-strong algorithm (sha256)

  • Step3: I cut a substring from the result.

  • Step4: I use this string as my salt (3 byte salt)

So, I can do:

  • Step1 and 4;

  • 1, 2, 3, and 4;

  • 1, 3, and 4; etc.

The question is which of these operations increases entropy, and which does not?

Some code:

function delicious_delicious_salt()
    {
        $string = openssl_random_pseudo_bytes(3);
        //Do I hash this? Do I generate a longer byte string and cut it?
        return $string;
    }

What I actually don't know

Does passing pseudo-random bits through a hashing algorithm and cutting a pseudo-random set of bits from the output produce high entropy bits?

I want to implement a salt-generating function for my SHA256 password hashes on my web server, and as an exercise in my own understanding I want it to be as cryptographically secure as I can reasonably make it without going crazy (and taking entropy encoded from HIDs (mouse, mic, video)).

gal
  • 649
  • 2
  • 6
  • 12
  • The answers here are good, but I'd like to just make the solution 100% clear - you *don't need* a strong random number generator, you just need to generate a *unique* value for a salt. A 32-bit integer that increments per-user, e.g. an auto-increment ID, is completely valid as a salt. – Polynomial Feb 27 '13 at 20:05
  • Ah, thank you very much - a SQL database is involved, so I believe a 4-byte integer that increments automatically is very, very feasible/easy to implement. – gal Feb 27 '13 at 21:16

2 Answers2

5

Entropy is a measure of the number of possible outcomes for the whole system. If you start with three random bytes, then there are 16777216 possible inputs. Regardless of what you do from these inputs, you will only get 16777216 possible outputs; no amount of hashing & cutting & praying will change anything to that -- except possibly by reducing the number of possible outputs. With your hashing and then truncation to 3 final bytes, there is no guarantee that all 16777216 values for these three bytes will be possible. In fact, you should get about 10 millions of them, no more.

A salt is only as good as it is unique. It does not need to be secret or unpredictable, but you should strive for never using the same salt value twice, or at least to keep the reuse to the bare minimum. If your salts must fit in three bytes, and your source of randomness is three bytes, then the best you can do is to simply use these three random bytes as salt. Hashing them is just a needless complication, which has no security benefit, and actually reduces the space of possible salt values, thus implying a higher reuse rate.

You could generate salts with a non-random process, if you can keep a state. Namely, maintain a counter, which is incremented for every generated salt. Counter management can be troublesome (especially in multi-frontend systems), which is why using randomness is often preferred. But that randomness is just a tool to achieve uniqueness. It is not a problem if salts are not random or follow a predictable sequence. The problem is when you reuse a salt value.

Thomas Pornin
  • 322,884
  • 58
  • 787
  • 955
2

If you start with 24 bits (3 bytes) of entropy, you can only ever yield 24 bits of entropy as a result, no matter what process you go through.

This is because if you start with 224 possible input values, and perform some deterministic process with them, you will have 224 possible output values. The output values can be of arbitrary lengths (imagine seeding a stream-cipher with your 24 bit key), but there will still only be 224 of them. The probability distributions will also match, which means that you will have precisely the same amount of entropy as you started with.

In key-stretching (which is pretty much what you're talking about), you aim to end up with a value that is time-expensive to reproduce. So you could take your 24 bit input, hash it N times where N is some integer large enough for the process to take several seconds on desktop hardware, and use the output for something.

lynks
  • 10,646
  • 5
  • 29
  • 54