1. Field
This disclosure is generally related to parallel computing applications, simulation codes and protocols that use pseudorandom numbers and more specifically to algorithms and methods to generate pseudorandom numbers.
2. Description of the Related Art
Many important scientific computing applications, business and finance applications, and complex systems modeling and analysis techniques use pseudorandom number generators (“RNGs”). These applications may take advantage of the availability of thousands of computing cores on heterogeneous systems comprising multi-core processors (“CPUs”) and highly parallel general purpose graphics processing units (“GPUs”), provided that suitable parallel pseudorandom number generators (“PRNGs”) are available to simultaneously feed thousands of computing streams with high quality random number (“RN”) streams with low intra- and inter-stream correlations (inter-stream correlations may be referred to herein as “ISCs”).
A parallel or distributed application has the computational task that may be divided into several thousands or millions of subtasks, with each subtask executed by a separate thread or process (henceforth, process). Each process has distinct ID that is usually logically numbered within the context of the application execution.
For an iterative parallel application, each process may execute some of the iterations. For example, for a large lattice structure simulation, each process may simulate the working of a few of the lattice points. Therefore, processes often cycle through computing and communication mode. In the computing mode, a process may use the available data to perform new calculations needed to make progress toward the solution. In the communication mode, a process may send its data or receive other process' data.
It is common to use the single-program multiple data (SPMD) programming method to code parallel applications, in which each of the processes receives the same computer code but has explicit instructions that specify based on the process's ID its portion of the task.
If an SPMD-based parallel application code that uses random numbers is executed, all or some of the processes (spawned for the execution of the application code) request random numbers from the same program locations or contexts.
In some applications, all required processes may be spawned statically at the start of the code execution. In other applications, some of the processes are spawned initially and any additional processes are spawned dynamically by the existing processes based on the application data and the coded algorithm or model. In highly complex simulation codes, the initial processes may need to spawn additional processes, dynamically, during the execution. However, with SPMD programming method, all processes use the same application code with the task for each process specified by conditional statements based on the data and the process ID.
In some systems, to distinguish requests for random numbers from different processes, an application is coded such that each process uses a RN stream identifier to explicitly identify a distinct stream allocated to it. The stream allocated to a process may be initialized by a special function call prior to generating or using any RNs from that stream.
A large application code that uses RNs may be executed by dividing the computing task among multiple processes. Typically, each process is allocated at least one distinct RN stream to provide the RNs needed during its computations. To improve randomness and to improve the reproducibility of results, an application may be coded such that each portion of computing workload, for example, each small subset of the iterations of a large iterative code, may be assigned a distinct RN stream identifier so that each workload may use a distinct RN stream for the necessary RNs in its execution. In such cases, especially for efficiency reasons, each process may be assigned one or more of the computing workloads, and thus, one or more of the distinct RN stream identifiers. It is computationally inefficient, hard to reproduce results, or both to code an application so that an RN stream is shared by multiple processes.
The RN streams to processes may be allocated based on the input data and/or computations allocated to them. For example, if a computational loop is partitioned cyclically among p processes, then iteration i may be executed by process i % p; if each iteration is to use a separate RN stream, then the number of iterations is smaller than the maximum of RN streams and it may be natural to allocate RN streams i,i+p, . . . from the set of all RN streams to process i.
One way to ensure that distinct RN streams are used is to allocate distinct RN stream identifiers and to use a PRNG that ensures that distinct RN stream identifiers result in initialization of distinct RN streams, which for a well-designed PRNG, may have low or undetectable—based on the currently available statistical and other tests—interstream correlations.
If the application requires each process or computational workload to request random numbers from multiple program locations or contexts, then there may be two options. One option is to use the same RN stream for all contexts within a process. The same contexts in two different processes will still use distinct RN streams provided distinct stream identifiers are allocated and initialized for different processes.
A second option is to use multiple distinct streams for multiple contexts in each process, potentially one distinct RN stream for each distinct program context. This second option may be desirable for better randomness properties. In such a case, the application code is explicitly written to manage these multiple streams. If the number of distinct streams needed for an application is not known in advance, the maximum number of streams needed per process is estimated and the same are allocated to each process.
If the estimation is too small, then a program error is generated and execution is halted. In this case, the user needs to revise the estimate for the number of streams needed and resubmit the application for execution.
If the estimation is too large, then the program may run out of distinct RN streams for processes spawned after some point. This is especially true for parallel applications that are tuned and run on large clusters of computers with a large number of processes are run on even larger clusters of computers with even more processes, by a simple change in compile-time or runtime options without application recoding, to take advantage of the additional performance offered by the larger hardware.
To further control the generation of RN streams, an application may provide a single-seed value, typically by a designated master process (usually process 0) to a PRNG. The single-seed value is typically a 32- or 64-bit number, often an integer, specified by the user as part of the application's input data. By keeping all other input data the same and changing only the seed value, the user can run multiple instances of the same scenario, average the results and obtain potential simulation error estimations (also called, confidence intervals in statistics).
The quality of the random numbers used may be crucial for quick and accurate solutions to simulation-based computer solutions and for robust security protocols and security keys used in security protocols. It may be desirable to use distinct parallel RN streams if an application code calls for RNs from multiple distinct locations so that, within a process, multiple calls for RNs from the same location (also called, program context) are satisfied by providing RNs from a specific stream, while the calls for RNs from different locations of the program within the same computing iteration will be satisfied by providing RNs from different streams. Distinct RN streams across different processes may be ensured by the use of distinct RN stream identifiers to initialize the RN streams. To use distinct RN streams for distinct contexts within a process or computational workload, the application has to be coded specifically to use distinct RN stream identifiers for each such program context. Such an approach may, however, provide an unreasonable burden on the application designer and make revisions to application code, which may change the number of program contexts from which RNs are requested, cumbersome and potentially error-prone.
In some parameterized PRNGs, each process is given one RN stream with appropriately parameterized seed or iteration function. Two main approaches to design PRNGs are (a) splitting a sequential RN stream into multiple substreams, with each substream treated as a distinct RN stream for application execution purposes, and (b) parameterization of the initialization (seed) state of an RNG with multiple random number cycles or the parameterization of the iteration function of the initialization of an RNG. The leap-frog technique which splits a sequential RN stream in an interleaved manner—if a sequential stream consisting of x1, x2, x3, . . . needs to be split into k streams, then stream i consists of RNs xi, xk+i, x2k+i, . . . , 1≦i≦k—received extensive attention. But it is inherently not scalable owing to initialization cost—a large multiple of k RNs must be generated first to initialize each processor/process—and potentially increased intra-stream correlations.
The Mersenne twister (MT) is a variant of feedback shift register-based random number generator. The original generator MT19937, which generates a single RN stream with a very long cycle of length 219937 (that is, the sequence of RNs repeats after generating this many RNs), is very popular and is widely implemented in various software packages (including Gnu Scientific Library, gsl package). SFMT19937, a parallel 128-bit version, and MTGP, a GPU version as part of NVIDIA CUDA library, are also available. Using MT to generate multiple parallel RN streams often requires splitting its sequential RN stream. This is largely an ad hoc process since the maximum number of RNs needed in each segment needs to be estimated. This also may compromise the randomness quality since segmenting the stream and using the segments changes the correlations among the RNs used. Direct parallelization by changing the parameters of MT is computationally expensive and may not be suitable for dynamic generation of random number streams in a high-performance simulation code.