Java tutorials home  java.util.Random  Random number generators  XORShift  High quality random  Seeding generators  Entropy  SecureRandom  Random sampling  Random simulations and nextGaussian()

How does java.util.Random work and how good is it?

The java.util.Random class implements what is generally called a linear congruential generator (LCG). An LCG is essentially a formula of the following form:

numberi+1 = (a * numberi + c) mod m

In other words, we begin with some start or "seed" number which ideally is "genuinely unpredictable", and which in practice is "unpredictable enough". For example, the number of milliseconds— or even nanoseconds— since the computer was switched on is available on most systems. Then, each time we want a random number, we multiply the current seed by some fixed number, a, add another fixed number, c, then take the result modulo another fixed number, m. The number a is generally large. This method of random number generation goes back pretty much to the dawn of computing1. Pretty much every "casual" random number generator you can think of— from those of scientific calculators to 1980s home computers to currentday C and Visual Basic library functions— uses some variant of the above formula to generate its random numbers.

LCG parameters used by java.util.Random

The actual parameters used by java.util.Random are essentially taken from the UNIX rand48 generator (though with a slightly different seeding function). For reasons discussed later, only the top 32 bits of each 48 bits generated are used. With these parameters, the resulting random number generator appears to be about as "good as it gets" for an LCG.

Depending on the values chosen for a, c and and m, the quality of random numbers produced by this method varies between "unbelievably disastrous" and "OK for casual applications". For practical reasons, it is generally common to do one of the following:

  • make a close to a power of 2 (so that the multiplication can be performed by shifting and adding/subtracting2), or;
  • make m a power of 2, often the register size of the machine (such as 232 for a 32-bit machine) so that the modulo is carried out either "for free", or at worst via an AND operation rather than an expensive division3.

With or without these constraints, values for the parameters are then generally sought so that:

  • every possibly value between 0 and m-1 inclusive is generated before the pattern repeats itself;
  • the numbers generated are as "statistically random" as we can get them (see below).

Since for a given "current seed" value, the "next seed" will always be completely predictable based on that value, the series of numbers must repeat after at most m generations. This is called the period of the random number generator. In the case of java.util.Random, m is 248 and the other values have indeed been chosen so that the generator has its maximum period. Therefore:

The period of the java.util.Random generator is 248.

In decimal, 248 is a few hundred million million. That might sound like enough— and it is for certain applications— but it does mean some quite severe limitations in other cases. For example, consider an application where you pull out a number of 2-integer pairs (and where you use the full range of the integer). One integer has 232 possible values. So the number of possible combinations of 2-integer pairs is 232 * 232, or 264. In other words, java.util.Random will not be able to produce every possible combination. Of course, even a generator that produced "perfect" random numbers with a 248 period would have this limitation.

For some testing or scientific applications, that would be bad enough. But it turns out that with LCGs, things are actually worse:

  • When taking combinations of values, e.g. for coordinates, the resulting pairs, triples etc always have a particular mathematical relationship, sometimes described as "falling in the planes".
  • Not all bits are produced with equal randomness: the lower the bit in the number generated, the less "random" it actually is. This can introduce artefects if you write code such as if (r.nextInt() > = 2) {}. See the section on the randomness of bits with LCGs for an illustration and guidelines on minimising this problem.

Next...

See the two pages linked to above for more details on the flaws of the LCG method. For a better alternative that is trivial to implement in a few lines of Java, see the XORShift generator. See also:

  • the introduction to random numbers in Java, in which we gave a checklist of dos and dont's of using java.util.Random;
  • a look at random sampling: the problem of picking x random choices from a total of n, a problem that many people implement inefficiently or wrongly, and whose correct solution turns out to be quite simple.

1. It is generally attributed to Dick Lehmer, who appears to have intoduced it formally in a 1948 conference paper.
2. For example, 65539 is 216+3. So 65539x can be calculated as x<<16+x+x+x. Shift and addition/subtraction instructions are generally much faster operations than mutliplication.
3. For example, to calculate a number modulo 248— the modulus chosen in java.util.Random, we AND it with (248-1).

comments powered by Disqus

Written by Neil Coffey. Copyright © Javamex UK 2013. All rights reserved.