The subject matter described herein generally relates to parallel processing of a plurality of stochastic simulations, wherein the behavior of each simulation depends upon a set of input parameter values.
Parallel computing involves executing many calculations simultaneously using the principle that large problems can be divided into smaller ones and solved concurrently. One or more computing devices (computers, servers, et cetera) are often employed to carry out parallel computations. Various hardware arrangements support parallelism, for example a multi-core or multi-processor computer having multiple processing elements within a single machine, or clusters of multiple computers arranged to participate in executing parallel computing tasks. The well known Hadoop software system, for example, allows robust parallel processing over a cluster of thousands of commodity machines.
Stochastic simulation is a numerical technique for predicting or studying the behavior of systems or phenomena that are characterized by randomness. For such systems, the “performance measures” of interest, such as the future price of a stock option or the predicted insulin level of a patient after taking a prescription drug, are described by a probability distribution over the possible values of the performance measure. Simulation is used when this probability distribution is too complex to be computed analytically, and proceeds by repeatedly generating pseudo-random samples from the probability distribution of interest.
The process of generating an individual sample is called a Monte Carlo replication, and may involve a complicated series of computations. The Monte Carlo replications are used to estimate important characteristics of the probability distribution. For example, the expected value of the probability distribution can be estimated by the average of the Monte Carlo samples. In practice, pseudo-random numbers are used to emulate the randomness of a real-world system on a computer. A stream (sequence) of pseudo-random integers, called seeds, is generated according to a deterministic algorithm, but the algorithm is designed so that the sequence of seeds appears random to an observer. For example, the seed sequence will pass standard statistical tests for randomness. The sequence of seeds is usually transformed via simple numerical operations into to a sequence of pseudo-random uniform numbers that are then used in the simulation.
Exploiting the fact that different Monte Carlo replications can be computed independently of each other, traditional parallel simulation techniques attempt to speed up simulation processing by executing the Monte Carlo replications for the system in parallel, and sometimes also by decomposing the system to be simulated into disjoint components that can be simulated in parallel. In this approach, the degree of parallelism is equal to the number of available processors, which can range from tens to thousands. In general, the behavior of a simulation depends on a plurality of input parameters; in the traditional parallel setting, a set of parameters is associated with each parallel processor. The amount of “seeding” information used to provide a stream of pseudo-random numbers at a processor may be large.