This invention relates to random number generation in parallel processes, and more particularly to random number generation in massively parallel processing (MPP) systems such as databases.
Random number generators are used in many applications. They use iterative deterministic algorithms for producing a sequence of pseudo-random numbers that approximate a random sequence. It is important that the generators have good randomness properties and that the sequence be uniformly distributed, uncorrelated, reproducible and easily changed by adjusting an initial seed value. Parallel random number generators running on parallel processors in a distributed system, such as a MPP database, in addition should produce the same sequence on different processors, have no correlation between the sequences, and produce good quality random numbers. MPP database systems require good quality random numbers for analytic workloads such as Monte Carlo simulation and for random sampling of physical tables. Obtaining random numbers that satisfy these requirements from parallel computers is difficult, particularly in MPP databases where the number of nodes that process a query is not known in advance, and where communications between processors is impractical.
In a distributed database system, if each segment node initializes with the same seed and the processes pick up the sequence numbers at the same positions, the random numbers will be the same and are not of good quality. On the other hand, if each segment node starts with different seed values and thus generates different sequences, the returned values of the random numbers may overlap each other so that the quality is unpredictable, which is unacceptable. It is important to ensure that the segments of the MPP database generate the same sequence but return different numbers at the different positions in the sequence on different segments. It might be possible to accomplish this if the segments were able to communicate with one another. However, this is not feasible in an MPP database where low latency is essential and there may be thousands of segments.
It is desirable to provide random number generators for MPP databases that address the foregoing and other known problems of generating quality random numbers and distributed processing systems, and it is to these ends that the present invention is directed.