How does java.util.Random work and how good is it?

The java.util.Random class implements what is generally called a linear congruential generator (LCG). An LCG is essentially a formula of the following form:

numberi+1 = (a * numberi + c) mod m

In other words, we begin with some start or "seed" number which ideally is "genuinely unpredictable", and which in practice is "unpredictable enough". For example, the number of milliseconds— or even nanoseconds— since the computer was switched on is available on most systems. Then, each time we want a random number, we multiply the current seed by some fixed number, a, add another fixed number, c, then take the result modulo another fixed number, m. The number a is generally large. This method of random number generation goes back pretty much to the dawn of computing1. Pretty much every "casual" random number generator you can think of— from those of scientific calculators to 1980s home computers to currentday C and Visual Basic library functions— uses some variant of the above formula to generate its random numbers.

LCG parameters used by java.util.Random

The actual parameters used by java.util.Random are essentially taken from the UNIX rand48 generator (though with a slightly different seeding function). For reasons discussed later, only the top 32 bits of each 48 bits generated are used. With these parameters, the resulting random number generator appears to be about as "good as it gets" for an LCG.

Depending on the values chosen for a, c and and m, the quality of random numbers produced by this method varies between "unbelievably disastrous" and "OK for casual applications". For practical reasons, it is generally common to do one of the following:

With or without these constraints, values for the parameters are then generally sought so that:

Since for a given "current seed" value, the "next seed" will always be completely predictable based on that value, the series of numbers must repeat after at most m generations. This is called the period of the random number generator. In the case of java.util.Random, m is 248 and the other values have indeed been chosen so that the generator has its maximum period. Therefore:

The period of the java.util.Random generator is 248.

In decimal, 248 is a few hundred million million. That might sound like enough— and it is for certain applications— but it does mean some quite severe limitations in other cases. For example, consider an application where you pull out a number of 2-integer pairs (and where you use the full range of the integer). One integer has 232 possible values. So the number of possible combinations of 2-integer pairs is 232 * 232, or 264. In other words, java.util.Random will not be able to produce every possible combination. Of course, even a generator that produced "perfect" random numbers with a 248 period would have this limitation.

For some testing or scientific applications, that would be bad enough. But it turns out that with LCGs, things are actually worse:

Next...

See the two pages linked to above for more details on the flaws of the LCG method. For a better alternative that is trivial to implement in a few lines of Java, see the XORShift generator. See also:


1. It is generally attributed to Dick Lehmer, who appears to have intoduced it formally in a 1948 conference paper.
2. For example, 65539 is 216+3. So 65539x can be calculated as x<<16+x+x+x. Shift and addition/subtraction instructions are generally much faster operations than mutliplication.
3. For example, to calculate a number modulo 248— the modulus chosen in java.util.Random, we AND it with (248-1).