Random Number Generation

23 Nov 2022, 23:25

Random numbers are a very useful thing to have. Why else would we have created dice to roll and coins to flip? We use random numbers to make decisions where there would be indecision, or when we need digital security. However, if you have studied random numbers, you may come to the conclusion that truely random numbers are very difficult to produce. In theory, you could calculate which way a group of dice will land based on a certain throw. The same applies to code because it is meant to be predictable and output the same result everytime. So, how is it possible for us to use code to generate these random numbers even though the code itself is not random and is actually very predictable? And do the methods we use differ when using a microcontroller or a desktop computer? Let’s find out.

Let’s imagine for a second that we are the first people to ever write code to generate random numbers. How would we do it? Well, we would likely start out by creating an algorithm that seems almost impossible to reverse engineer. Ideally, the output would be different every time it is run, otherwise the number isn’t actually random. The simplest way to do that is just to use the previous output as part of the input of the next number in the sequence. That way, we can basically guarantee that the outputs will not be the same every time. But how can we turn the input into an output so that it generates a number that is completely unpredictable? Well for that, we can simply run a mathematical operation. Perhaps we can use multiplication of the previous two random numbers?

Let’s write a C program to test this idea. Let’s say that the previous two were 3,231 and 74, so we would get a result of 239,094. The next call to this algorithm would be 239,094 multiplied by 3,231 which gets us an answer of 772,512,714. And we can keep going for a long while. Seems pretty good and the numbers eventually overflow, based on the size of the integer being used. That is, it’s pretty good until we get a result of 0. Then, it would return 0 forever. We could simply just add something to this algorithm to counteract that. We could simply add 23 for example. Feel free to experiment with other algorithms yourself and create numbers that appear to be very random. And luckily for us, there actually is a very easy way to determine whether our code actually generates good random numbers.

We can use a program called ENT which will test how good our random numbers are. It runs several tests on the numbers that we generate. I will leave the link to webpage in the description. You can also install it using the AUR if you are on an arch-linux based distribution. Anyway we can utilize this program by either passing it a file full of our data or by feeding it data through standard input. I opted for the latter and piped 50,000 samples of our random data through. The scores are a bit confusing at first, but they all are useful in their own right. Entropy should be as close to 8 as you can get it. The chi-square test should be close to 50%. The edges, 0.01 and 99.99 percent are bad. The arithmetic mean value should be as close to 127.5 as possible. Monte Carlo Value for Pi should be close to 3.14. And finally, the serial correlation coefficient should be close to 0. You can read more about all of these tests on the ENT website. Make sure to input a lot of values to reduce luck. If we take a closer look we can actually see that the algorithm suffers from a repeating problem. Anyways, the algorithm that we made isn’t all that great, so we should try to find a better one.

Let’s take a break from toying around with making our own algorithms, so that we can take a look at one that is tried and true. The most famous random algorithm is the LCG algorithm, or Linear Congruential Generator. The equation goes like this: the next random number is equal to the previous random number multiplied by a multiplier and added to an increment. The whole thing is then taken to a modulus. The modulus and the multiplier should be greater than zero for obvious reasons. The multiplier and increment should also both be smaller than the modulus. Implementing the LCG algorithm results in good scores across the board, except for the Chi-square algorithm.

Great, we can start using our random numbers… almost. Everything that I have shown you so far in this video is not actually random. We’ve been dealing with pseudo-random numbers. If you take a closer look, you’ll realize that running our algorithms multiple times always results in the same numbers in the same order. Not so random after all. This is caused by what we call a seed. Normally, the next number is generated with some previous input. The problem arises on the first number. Since there is not a previous value to fall back on, we need to come up with one ourselves. That means that if someone knows the seed, they then know all future values. Pseudo-random numbers still have a use though. For non-critical applications, such as video games, the predictable nature of the generator doesn’t matter, because nothing is at stake and someone reversing the algorithm wouldn’t be a serious problem. If you have ever tried a game like Minecraft, you’d know that the seed determines which world the computer generates for example. And inputting the same seed a second time results in the same world. Pseudo-random numbers are also very fast to generate.

But what if you want a truely random number that a malicious user should not know? For that, we need a source of what can be called entropy: something so unpredictable that it is practically random. This entropy will assist in generating our seed. This is rather simple on desktop computers since we can simply use the keypresses on the user’s keyboard or the movement of the mouse or even what time the computer is turned on. You can see this in action for youself if you are running a UNIX system. Simply read the contents of /dev/random. Here you will find a stream of random bytes that were created using entropy that your computer found. This system is used for generating things like GPG keys. You’ll notice that GPG will prompt you to create entropy when generating a key because it wants to ensure that your key is sufficiently random to protect against attacks. And taking the output from /dev/random and placing it into ENT, we find amazing scores across the board. Sidenote: running hexdump on /dev/random kind of makes you look like one of those hollywood movie hackers. Anyways, while this is all interesting in theory, let’s get close to the hardware and generate some of our own random numbers.

To do that, I’ve pulled out this ATmega328p microcontroller. We can communicate between the PC and the microcontroller over a serial interface. I simply used this FTDI board to do so. Arduinos have serial-to-usb interfaces built in as well. Reimplementing the LCG algorithm from earlier, we find that we get the same numbers in the same order that we got on the PC version. And this would be good enough if you were implementing an electronic die. But, again, we are trying to get real random numbers.

The most common way to get a seed, it seems, is just to read an unconnected analog pin. This seems like a good idea since it seems like the analog return value could be anything, especially since there is a lot of radio frequency noise all around us. The only problem with this is that the analog pin usually returns similar values depsite floating. A way to counter act this is to read multiple floating pins, or just the same pin multiple times. Then you can add all of the samples together to amplify the difference caused by random interference. This way you will get numbers that are more random, with the cost being speed.

The list of entropy generators goes on: microphones, temperature sensors, noisy diodes, button presses, and clock jitter. Basically, you just need to find something which is unpredictable to the point that nobody can reasonably reverse it.

Ok great, we have created a random seed, but there’s still one problem. Somebody can still easily reverse our LCG algorithm if they know a few random numbers in the sequence. So, how can we overcome such a challenge? Well, the first thing we should do is combine multiple sources of entropy to make everything more unpredictable. So, for example, we would save five floating analog pins and value from a hardware timer. This makes it very difficult to piece together the seed at the start. The next thing we should do is mix-up the seed every now and again. So we could randomly decide that after 10 seconds, we should restart the random sequence with a new random seed. And finally, we need to separate the output from the generated numbers. What I mean is that we should hash the generated number and make it so that there is no easy way to determine which number the algorithm created, while still giving us a useful random number.

And one of the most well-known hashing algorithms is the SHA256 hashing algorithm. You may have heard of it from the bitcoin blockchain. Basically, anything that we feed into it will produce another number. The best part about this, however, is that it is not easily reversible. Meaning that you cannot easily take the ouput and deduce the input. That’s why bitcoin uses it by the way. The miners must spend huge amounts of processing power to reverse just one hash. This is great because by the time an attacker cracks the algorithm and finds our seed, it doesn’t matter because we have already collected new entropy and switched over to a new seed. Anyways, to implement it I simply used this avr-crypto-library from the user cantora on github. I will post a link to it in the description so that you can use it yourself. Also make sure to salt the hash as well so it is harder to brute force. I salted it using the current value of the hardware timer.

Using our new complete random number generator creates great results. We can also capture the results and feed them into ENT. It gives us good scores. This may not be the most secure way of generating random numbers, but it certainly is 100x more random than the pseudo-random numbers we created at the start of this video.

There is one more thing that I would like to mention. Some microcontrollers make generating random numbers easy. We had to go through the design process for the ATmega328p because it doesn’t provide any hardware random number generation. However, some more expensive microcontrollers, such as the ESP32 have onboard random number generators. Using it is trivial as well. All you have to do is access the RANDOM_REG32 memory location and you will get very good random numbers.

Anyways, I hope you enjoyed this video and now understand how computers can generate random numbers. If you did enjoy please consider subscribing so that you can see my other videos. Have a good one.

Random Number Generation

Share!