In matters of security asking the right questions is extremely important, as is asking clear and precise questions. Learning how to ask these questions is difficult so I will try to restate your questions in more precise terms as well as answer some questions that I think will help further your understanding.
- Does the fact that the time-of-day seed overflows create a security vulnerability?
Basically no. I don't know of any modern system whose random seed overflows every 24 hours. Some old BASIC implementations did this. The main attack I can think of is to cause the program to run at the same time the next day, and you will get the same sequence of random numbers.
- Does the fact that the time-of-day is used as a seed create a security vulnerability?
Absolutely. If a program runs srand(time(NULL))
and you know it started 2 hours ago, plus or minus 2 minutes, you know that it got seeded with one of 240 possible values, and you can easily brute force all 240 values, and if you can observe a couple of outputs you can quickly determine which of these seeds was the actual one and how far along in the sequence it has gone.
If this program happens to be one that is generating keys based on that seed alone, you can re-run the program with each of those 240 values and one of them will give the exact same keys.
- So what if we give a "true" random seed to a pseudo-random generator?
This is definitely better but is not necessarily good enough. Some PRNGs are "stronger" than others, but if you observe enough outputs of the PRNG you can eventually deduce the internal entropy value and begin predicting future values. PRNGs are usually designed to be efficient to execute, rather than be cryptographically secure. However there are exceptions.
- If we select a pseudorandom location in an audio file, does that give us "true" randomness?
No. The audio file can be considered part of your seed. If you run the same program with the same PRNG seed and the same audio file you will get the same sequence of numbers. Alternatively you can consider the audio file as a "secret" part of your algorithm, which is then security through obscurity.
There's more. An audio file is not random at all (unless it's white noise!) If you exploit some other weakness in the system to guess the outputs of the PRNG, then you can start to build a picture of what is in this audio file, which will let you make guesses when you get a sample that is quite close to a previous sample, but also, you might be able to guess what is in the audio file. For example if you reconstruct enough to guess that it is a recording of a human voice this will make it easier to guess some of the statistical properties within the file.
Even if you don't know what the PRNG is doing, you will still be able to observe a non-uniform distribution in the values of the frequencies in the audio.
- OK, forget the audio file. Let's just say we have a giant table of 10 million truly random numbers that we got somewhere. And we use a PRNG to index into it. Is that random?
What advantage does that have over just using the numbers in the table in the original order? Are you hoping to be able to re-use the table for a longer period of time? The longer you use it for, the more likely some weakness in the PRNG will be exploited to your detriment. Better to have each entry in the table have a fixed lifespan: single use only. In this case adding a PRNG does not make it any more random. Rather than storing big tables which might be leaked, you might as well get the numbers from a hardware generator exactly when you need them, then discard them.
- exactly how vulnerable would it really leave systems if pseudorandom values are used?
This depends hugely on the precise combination of application, seed source, PRNG algorithm, and threat model. If the seed source is time(NULL)
, and the attacker has access to the binary, it's like leaving your keys in the front door. If you at least pull the seed out of /dev/random
you might be making an attack infeasible, or it might just need the attacker to observe your program for a few weeks to get enough info to make some deductions.
If the attacker doesn't have your binary but correctly guesses you are using srand(time(NULL))
then they can potentially collect enough observations to make their move.
An attacker could potentially launch a DDoS attack or something to crash your server or cause more instances to spin up, thus guaranteeing that they get their PRNG seed from a reasonably predictable timestamp.
If, however, you use a PRNG that doesn't make it easy to infer the hidden state, seed it with hardware entropy rather than a timestamp, and use the numbers in a way that doesn't leak a lot of information, it will not be easy to attack.
As with all things security though, just one mistake can completely compromise you, which is why it is advisable not to rely on all those assumptions about your seed, PRNG and application when you can instead simply use a sequence of one-time random values derived from hardware entropy.