What makes Random Number Generators so fragile?

Question

It seems to me that a hardware component which generates random numbers is extremely simple - just measure tiny vibrations in the hardware with a sensor, right? Maybe I'm wrong but it seems like if you measured vibrations with very high precision, you could easily generate unguessable random numbers.

However, I'm a cryptography noob, and know little. So I'm reading and an article says:

and you just don’t need to be that sophisticated to weaken a random number generator. These generators are already surprisingly fragile, and it’s awfully difficult to detect when one is broken. Debian’s maintainers made this point beautifully back in 2008 when an errant code cleanup reduced the effective entropy of OpenSSL to just 16 bits. In fact, RNGs are so vulnerable that the challenge here is not weakening the RNG — any idiot with a keyboard can do that — it’s doing so without making the implementation trivially vulnerable to everyone else.

I'm surely missing some critical detail about how random number generation is "fragile". Can someone explain?

Vibration would probably actually be more mathematically consistent than you might think. One way I’ve seen done for a simple hardware generator was to reverse-bias a transistor, which produces quantum level randomness. This was just a simple Arduino circuit... I’m sure real hardware random packages are more sophisticated. But not all systems have hardware available to them... especially in virtualized systems. — nbering, Jul 29 '18 at 01:24
It should also be noted that the section of the article that you're quoting is talking about a weakness discovered in a *software* random number generator. — Harry Johnston, Jul 29 '18 at 01:52
@HarryJohnston I see... For some reason I assumed the important random numbers would always be generated server side with proper hardware. I'm trying to get into cryptography and it's a massive subject to learn. Should probably read a book — john doe, Jul 29 '18 at 01:54
@johndoe When my phone opens a HTTPS website, both my phone and the server need to generate secret random numbers. Welcome to cryptography! It's a big field, but start with a corner that you understand and build out your knowledge from there. I found reading and answering questions on this site incredibly helpfulfur for my learning. — Mike Ounsworth, Jul 29 '18 at 03:23
I read that quote as someone who is not a crypto expert giving the general warning "UNLESS YOU KNOW WHAT YOU'RE DOING, DON'T MODIFY CRYPTO CODE, and even then, do it very carefully." I don't think RNGs are special in that regard; any crypto code would be fragile if people start making changes to code they don't understand. — Mike Ounsworth, Jul 29 '18 at 03:26
@MikeOunsworth It's especially true given the fact that many CSPRNGs are based off of other cryptographic primitives like AES, ChaCha20, or SHA-1. Change a single line in SHA-1 and you lose nonlinearity, completely breaking a PRNG based on it. Crypto code should be thought of as sacred! — forest, Jul 29 '18 at 03:36
It should be noted that the Debian RNG fiasco had very little to do with the fragility of cryptography per se — it was mostly a consequence of how easy it is to introduce silent errors (like invoking undefined behavior) in certain languages. — T. C., Jul 29 '18 at 07:00
Fragile here means "so complex maths that it is easy to break without being obvious and we were to lazy to create even basic test suites to check our codes performance" — PlasmaHH, Jul 29 '18 at 09:40
@MikeOunsworth To be fair it was completely horrible and stupid crypto code that should not have been written in the first place. — user253751, Jul 29 '18 at 23:15
`Maybe I'm wrong but it seems like if you measured vibrations with very high precision` Assuming the vibrating part has a maximum range of movement, I can effectively "de-randomize" my random function by placing the computer in an environment with massive perturbances (thus ensuring a consistent vibration from limit to limit). Ignoring that loophole, you're still only ensuring _unpredictable_ data but not necessarily **unbiased** data. There is no expectation of equal distribution in your oscillator. — Flater, Aug 01 '18 at 11:35

forest · Accepted Answer · 2022-03-12T01:35:11.113

Hardware vs software RNGs

The first thing you mention is a hardware noise source. High-precision measurement of some metastable phenomenon is enough to generate unpredictable data. This can be done with a reverse-biased zener diode, with ring oscillators, with an ADC, or even with a Geiger counter. It can even be done my measuring nanosecond-level delays in the timing between keystrokes. These noise sources can fail if the hardware itself begins to fail. For example, a transistor can break down if it is not specifically designed to operate in reverse at high voltages. While these techniques have varying levels of fragility, it is not what is being discussed in the text you quoted.

The second type of RNG you mention is a software RNG called a pseudorandom number generator (PRNG^*). This is an algorithm which takes a seed, which is like an encryption key, and expands it into an endless stream of data. It attempts to ensure that the data cannot be predicted, or told apart from pure randomness, without knowledge of the secret random seed that the algorithm started with. In this case, the PRNG is implemented in pure software, so breaking it only takes introducing a bug into the code, which is what the text you quoted is talking about. It is merely code that is fragile, risking complete failure if changes to the code are made that deviate from the algorithm's intended behavior.

A PRNG can be thought of as a re-purposed encryption algorithm. In fact, you can create a cryptographically secure PRNG by using a cipher like AES to encrypt a counter. As long as the encryption key (seed) is secret, the output cannot be predicted and the seed cannot be discovered. When you think about it this way, it becomes easier to understand how a small, inconsequential change in the code can completely break the security of the algorithm.

Collecting randomness

So how do modern devices actually collect randomness? Let's take a server running quietly in a datacenter somewhere. In order to support things like TLS, it needs a large amount of completely unpredictable data that cannot be distinguished from a truly random stream. Without a dedicated hardware noise source, the randomness must come from within. Computers strive to be fully deterministic, but they have plenty of input from non-deterministic devices. Enter... interrupts!

In modern hardware, an interrupt is a signal emitted by hardware to alert the CPU to a status change. It allows the CPU to avoid rapidly polling every hardware device for updates and instead trust that the device will asynchronously alert it when the time comes. When an interrupt occurs, an interrupt handler is called to process the signal. It turns out this handler is the perfect place to get randomness! When you measure the nanosecond-level timing of interrupts, you can quickly get a fair bit of randomness. This is because interrupts are triggered for all sorts of things, from packets arriving on the NIC to data being read from a hard drive. Some of these interrupt sources are highly non-deterministic, like a hard drive which relies on the physical motion of an actuator.

Once sufficient random bits have been collected by the operating system, a small seed of at least 128 bits can be fed into a cryptographically secure PRNG to generate an unlimited stream of pseudorandom data. Unless someone could predict exactly when every past interrupt occurred to nanosecond precision (along with several aspects of the CPU's execution context at the time of the interrupt), they will not be able to derive the seed and will not be able to predict future PRNG output. This makes the output completely suitable for TLS keys.

_{* A security-oriented PRNG is called a cryptographically-secure PRNG, or CSPRNG. Using a regular PRNG when an application calls for a CSPRNG can result in security vulnerabilities.}

zenner diodes aren't specifically designed to operate in reverse? — hytromo, Jul 29 '18 at 08:26
It's Zener diode, not zenner – it was named after [Clarence Zener](https://en.wikipedia.org/wiki/Clarence_Zener). — Ruslan, Jul 29 '18 at 12:56
@hytromo Not at breakdown voltage for long periods. You need an avalanche diode to handle that for a very long time. I reworded that sentence to be more specific (I didn't realize how silly it sounded!) — forest, Jul 30 '18 at 01:21
I think you should remove the sentence about "not specifically designed to operate in reverse at high voltages". So-called "Zener diodes" may use Zener or avalanche effects or both and of course they are designed to operate indefinitely under reverse breakdown. Perhaps you got the idea because transistor E-B junctions are sometimes used in reverse breakdown, and they are not designed to be used in that way. — Spehro Pefhany, Jul 30 '18 at 14:51
I like this answer, except that it doesn't even mention PRNGs vs CSPRNGs. Only one actually makes an attempt to be _unpredictable_; the other just tries to get you a lot of numbers that seem, at an uneducated glance, to be unrelated. (Not in that order.) That's an important distinction to make, especially on this site. Using a PRNG where you meant to use a CSPRNG can cause serious issues, and the reverse can lose a lot of performance (which can also cause issues) — Nic, Jul 30 '18 at 18:07
@NicHartley I've added a footnote to explain the difference. Thanks! — forest, Aug 02 '18 at 00:03

Harry Johnston · Answer 2 · 2018-08-02T23:36:51.540

17

Historically, hardware RNGs suitable for cryptography weren't commonly available on the PC: for example, according to this question AMD only added support a few years ago, so even today a software vendor can't simply assume that it will be available. That's presumably why OpenSSL (as discussed in your quote) was using a software RNG, making it vulnerable to the bug found in the code.

(As extensively discussed in the comments, a standard PC does contain a number of "sources of entropy" that a software RNG can make use of - and I believe OpenSSL does, though I'm not terribly familiar with it - but obviously in that scenario a bug in the software can result in bad random numbers, as indeed happened.)

There are also concerns that hardware RNGs might have been backdoored, leading people to combine hardware RNGs with other sources of entropy rather than using them as-is. (Backdoored hardware is also mentioned in your linked article, a few paragraphs up from the bit you've quoted.)

It should also be mentioned that hardware RNGs aren't nearly as simple to implement as your question suggests ... for one thing, naive implementations might be vulnerable to various physical attacks, for example, if you're generating random bits based on vibrations, what happens if someone aims an ultrasound at it? Even under ideal conditions, there's likely to be some sort of bias in the results that could make the generated bits unsafe for cryptographic use.

That's why real-world implementations use hardware noise but also process it cryptographically. But at that point you're back to the question of whether the algorithm (or its implementation) has been deliberately sabotaged, or perhaps just isn't as robust as believed.

edited Aug 02 '18 at 23:36

answered Jul 29 '18 at 02:15

Harry Johnston

1,667
10
14

1

One disclaimer: I'm no expert here. If you want any sort of further detail, you might like to follow up on the Stack Exchange Cryptography site. – Harry Johnston Jul 29 '18 at 02:19
2

Actually hardware RNGs were available for a very long time. You are confusing a CPU randomness instructions for a hardware RNG of any kind. Even first generation video consoles made use of what could be called a hardware RNG. – forest Jul 29 '18 at 03:22
@forest, but they weren't standard in PCs. – Harry Johnston Jul 29 '18 at 03:58
11

@HarryJohnston: most PCs have excellent sources of randomness. Turn the analog gain of a soundcard to max, and you get thermal noise. Turn the analog gain of a webcam to max and you get thermal noise. Measure variations in user input and you get noise. Measure fluctuations in the rotational speed of a HDD due to turbulence and you get noise. Measure voltage fluctuations on the mainboard and you get noise. You might google/bing for LavaRand, for example. What you didn't have until recently was a hardware RNG built into the CPU, with corresponding standardized instructions. – Jörg W Mittag Jul 29 '18 at 09:17
6

@JörgWMittag , most systems don’t have that hardware. No server in a rack has a sound card, web cam, or user input devices at all, and most newer ones have SSDs instead of HDDs. And they don’t have high precision measurement devices. A solution needs to apply to all kinds of machines, not just a traditional desktop or laptop. – John Deters Jul 29 '18 at 14:37
3

@JörgWMittag, the OP didn't ask about "sources of randomness". – Harry Johnston Jul 29 '18 at 19:16
2

@JohnDeters Well, another obvious option for where to get some randomness is to measure the timing of arriving packets on the network interface(s). (This probably works better on a busy server than on a home PC.) Servers typically have quite a few temperature sensors; small variations there can provide a little randomness. Same with voltage sensors. SSDs may not suffer from turbulence-induced variations, but I/O timing isn't completely deterministic on the nanosecond level anyway. Even being able to gather just one bit per second of good randomness gives you a solid random seed in two minutes. – user Jul 30 '18 at 05:54
@JohnDeters: Any computer with at least two clock crystals has an extremely good true entropy source: the rate of one with respect to the other. Clock crystals are subject to nonlinear frequency response to tiny changes in temperature and other environmental conditions, and these responses are subject to hysteresis. (Corollary: there is no such thing as cycle-perfect emulation of any computer with more than one clock source) – R.. GitHub STOP HELPING ICE Jul 31 '18 at 00:30
2

I'd like to point out that my answer deliberately doesn't discuss the issue of obtaining entropy for software random number generators, because I don't consider it to be directly relevant to the original question. Can the debate on this point please be taken elsewhere? – Harry Johnston Jul 31 '18 at 01:53
1

@JohnDeters: SSD's are excellent sources of entropy. The write cycle works like `while (cell_value != desired_value) write(cell, desired_value)`. This loop is needed because the `write` function isn't that reliable. This is especially a problem with modern TLC memory that uses 8 voltage levels. – MSalters Jul 31 '18 at 07:12
1

@R.. Corollary to the corollary: you don't need cycle-perfect emulation because the computer itself was not cycle-perfect. – user253751 Aug 01 '18 at 02:43
Is an ultrasound attack really feasible compared to other known attacks? There have been dozens of vulnerabilities discovered in RNG's over the years, and no high-profile ones that I recall required physical proximity. – Cody P Aug 02 '18 at 17:50
@CodyP, against real-world RNGs? I would hope not, but it might work against a naive hardware RNG of the sort the OP was asking about. (I chose ultrasound in particular because the OP suggested measuring vibrations.) If a naive RNG counts the number of times a vibration detector goes off in an N microsecond period, and generates a one if it is more than X or a zero otherwise - and then the software literally just strings those bits together to generate an encryption key - then presumably an ultrasound could easily make it generate all ones. – Harry Johnston Aug 02 '18 at 21:38
... but this sort of approach was never likely to be a high-profile attack, because nobody builds RNGs that work that way. Not for cryptographic use, anyway. – Harry Johnston Aug 02 '18 at 21:39

paj28 · Answer 3 · 2018-08-01T08:54:37.610

Because they are difficult to test

While it's easy to test that a random number generator produces output with the right format, determining whether it's statistically random is much more involved and unlikely to be included in an automated test suite. A lot of other code will be much more obvious if you break it.

Crypto needs to be right

In general, making sure that code is correct is difficult. A saving grace for a lot of code is that only a small proportion of correctness errors result in security vulnerabilities. But with cryptographic code - including random number generators - many correctness errors will result in vulnerabilities. Crypto code needs to be correct, to be secure, and ensuring it's correct is hard.

The Debian maintainer made a major error

The code is not actually that fragile. For it to be made insecure required major failings from the maintainer. To just chop out lines that produce warnings with only cursory checks that it's not broken anything, is pretty shoddy.

Edit: it was not just the maintainer's fault, see Angel's comment

Well, the Debian maintainer [did ask on openssl mailing list](https://marc.info/?t=114651088900003&r=1&w=2), noting that "the pool might receive less entropy", and was basically told "remove them" — Ángel, Jul 31 '18 at 21:41
@Ángel - Thanks for letting me know about that, it moves the responsibility considerably. — paj28, Aug 01 '18 at 08:55

score 4 · Answer 4 · answered Jul 31 '18 at 21:16

One of the most important aspects of any random generator is that the value of each random bit must not only be completely independent from the values of all other bits it generates, but it must also be completely independent from everything else in the universe an adversary might observe or influence. Unfortunately, that property is often one of the hardest to really guarantee. The best one can generally do is generate bits in a way that is unlikely to have any exploitable relationship to anything else in the universe.

As a simple example, if an adversary with a highly-focused RF transmitter had an ability to influence your random number generator so that certain samples of his choosing would have a 95% probability of yielding ones, while the rest would have only a 5% probability of doing so. If one were to simply read 128 bits from such a generator and build them into a 128-bit key, an attacker that focused a brute-force attack on bit patterns close to the one used to influence the generator would have a much greater likelihood of rapid success than if the generator had been unbiased. Suppose, however, that instead of picking bits one at a time, one picked groups of 7 bits and xor'ed them together. Doing that would increase sevenfold the time to generate a 128-bit key, but an attacker's influence would be cut from 95/5 to 74/26, greatly reducing the likelihood that the key would end up being close to the bit pattern the attacker was trying to force.

As an alternative, suppose one were to generate 128 random bits, hash them in some fashion, and then xor them with another 128 random bits. This would only require generating 256 bits rather than 896, but would make it very hard for an attacker to exploit any bias in the generator. Even though enumerating the most probable 95,000,000,000 bit 128-bit patterns would have a roughly 50% chance of matching a group of 128 bits used before the hash, or the one with which the hashed value was xor'ed, the final distribution after the xor would be unlikely to have any exploitable bias or information leaks.

Cody P · Answer 5 · 2019-03-22T05:38:03.983

The commonly used secure RNGs like Linux's /dev/random, ChaCha20, or RdRand work well for many conventional cases. However, they're far from idiot-proof. Say you do something funny like fail to set up your real-time clock when generating random numbers at boot. If you don't understand how that'll affect your RNG, someone who does might walk off with your private key. There's little room for error here because one small amount of non-randomness can compromise an entire cryptographical protocol like key-generation.

While issues with naïve roll-your-own implementations of random number generators or physical interference in your hardware make for good discussions, most vulnerabilities in the news with random number generators, like the Debian issue you mentioned, are not due to these issues. The biggest issues I've seen repeatedly are developers thinking they have a good source of entropy to seed the random number generator when they actually don't, mistakenly allowing the state of the random generator to be discovered and exploited, or lack of rigorous testing of the random number generator itself. The NSA doesn't need to backdoor your key generation if you're one of 0.75% of TLS clients using low-entropy keys. In summary, developers ignore the few, if any, warnings and assume their RNG will work in any application.

What's entropy and where do I get some?

Since all computer programs produce the same outputs given the same inputs, they must read from a source of entropy (or unpredictable data) in the operating system or hardware. Nowadays we have things like the RdRand command that can generate tens or hundreds of MB of entropy every second. However, devices with hardware random number generators like the Ferranti Mark 1 in 1951 or the Intel 82802 Firmware Hub in 1999 were the exception rather than the rule until the 2010's.

So historically random number generators rely on relatively slow entropy sources like human input or computer timings, and legacy systems might have almost no built-in functions with good sources of entropy available. Linux's /dev/random, for example, may use startup clock time, timing of human input devices, disk timings, IRQ timings, and even modification of the entropy pool by other threads

In many ways random number generators are fragile because these standard ways of getting entropy are not fool-proof. Anything that makes these entropy sources predictable or limited will compromise your RNG, for example:

The Debian bug you noted used only the Process ID for entropy.
If you use a headless, pre-configured operating system that generates keys at boot, a lot of Linux's entropy sources could be predictable. source
Android's Java Cryptography Architecture was found to require explicit initialization from a good entropy source on some devices.
Generating random numbers too quickly in Linux will deplete the entropy pool faster than it can be replenished, leading to less-random numbers.

Figuring out the state and lack of reseeding

Often RNGs don't get new entropy with every function call like /dev/random does. Sometimes you can't get enough entropy fast enough, or you don't trust the entropy source completely. So instead the RNG is seeded with a known source of entropy, then produces independent values from that seed. However, when someone figures out the internal state of the generator things go poorly, leading to everything from cloning smart cards to cheating a slot machine in Vegas.

A buffer overflow attack or similar attack can reveal the state of the random number generator. Learning the state may also be possible with a brute-force attack, especially if the algorithm is known and is reversible, can be computed quickly, or a plaintext is known. This was the case for issues with Windows XP, Dropbear SSH library, XorShift128+ in Chrome, and the Messerne twister algorithm, among many others.

Requiring advanced mitigation for these known-state attacks makes the RNG fragile. The best way to mitigate known-state attacks is by not using a vulnerable algorithm (like most CSRNGs). This question also explains in more detail exactly what makes a good RNG secure. However even CSRNG's sometimes also have weaknesses (for example, the RNG vulnerability in the Linux 2.6.10 kernel). So defense in depth requires mitigations like using separate states for random number generators (perhaps one per user), refreshing the seed frequently, and through protections from side-channel attacks and buffer overflows.

Passing blame between developers and users

Often these RNGs are fragile because of miscommunication of limitations between library developers or OS creators who can't design a fool-proof system and users who expect one. Linux, for example, forces users to choose between high-latency /dev/random and potentially low-entropy /dev/urandom. As another example, PHP prior to 5.3 had no support for strong PRNG's in Windows through interfaces such as mcrypt_create_iv(), and prior to 7.0 didn't have a good built-in CSPRNG.

Difficulty in Detection

There's a popular discussion point when discussing random numbers that, for a truly random number, every possibility is equally likely and there is an infinite numbers of potential patterns. So how can you truly look at a sequence and say it isn't random? (relevant Dilbert)

In reality, detecting patterns in random numbers is a mature, although imperfect, field and the question about whether non-randomness can be detected has been addressed since M.G. Kendall and B. Babington-Smith's 1938 paper. You can demonstrate that specific kinds of patterns are not significantly more likely to appear than random chance. For example, I can check if the digit 1 is more common than other digits, with thresholds determined by a chi-squared test. As long as these tested patterns are at least remotely likely and you check a long enough set of generated numbers, odds of a false positive is low. While some hidden issues with some random number generators can go undetected for years, if you've done basic cryptanalysis and then apply industrial-grade testing as covered in this question then you can't go too wrong.

However, designers may also just underestimate their attackers (how were you supposed to predict people would reverse-engineer and time your slot machine?). Worse, sometimes the random-number generator or entropy generation is never inspected by an expert, and only the outcome of the RNG's use is examined, like when PS3 firmware signatures were signed with a constant "random" output.

At the end of the day, the issue here is similar to that in much of cybersecurity: you have a very complex set of protocols, requirements, and devices for random numbers. As always, if you don't understand the complexity, you're vulnerable to an attacker who does.

Why would not setting up the RTC affect randomness in any way? — forest, Mar 22 '19 at 02:57
@forest RTC is used for the Linux entropy pool, so if the RNG is used at a predictable time (say, generating keys at boot) it can lead to more predictable keys. I'll edit my answer to clarify. See this paper for more: https://factorable.net/weakkeys12.extended.pdf — Cody P, Mar 22 '19 at 05:33
That's not the RTC, that's the TSC (reference cycle counter via the RDTSC instruction, though some MIPS32 systems use a different counter I believe). The RTC has only one second granularity and a significant latency and _many systems_ do not have one, but it has no effect on the randomness driver. And you can't disable the TSC in the kernel without intentionally patching it to forcibly set CR4.TSD, though I think that might not even have an effect on ring 0 code anyway. There's really very little you can do Kconfig-wise to significantly impair entropy collection. Even killing HPET is harmless. — forest, Mar 22 '19 at 05:38
What I mean to say is that you probably mean the timestamp counter (TSC), not the real-time clock (RTC). — forest, Mar 22 '19 at 05:46
Rerading the paper it seems to be referring to initialization of the entropy pool, which affects keys generated with /dev/urandom at first boot, before the entropy pool has "filled". This pool initialization is done with startup time. Perhaps changing it to "like generating keys within a minute of first boot" would be clearer and less ambiguous. — Cody P, Mar 22 '19 at 06:11