1. Field of the Invention
The present invention relates to processes for generating random numbers using a graphics processing unit (GPU).
2. Description of Related Art
The GPU is a device specialized for carrying out image processing, such as for computer graphics (CG), at high speed, and it is usually integrated into a computer device, such as a personal computer or a workstation, for displaying images. The GPU has theoretical performance far exceeding that of a general-purpose central processing unit (CPU) and is inexpensive, and it is thus employed for scientific computing and the like as well as for its original image processing purpose (refer to Japanese Patent Application Laid-open No. 2009-86202). A computing method that employs the GPU is referred to as GPU computing.
FIG. 4 is a block diagram of a computing device provided with a GPU. As illustrated in FIG. 4, the computing device includes a CPU 1, a main memory 2, a GPU 3, a GPU memory 4, an input device 5 including a keyboard and a mouse, a display device 6 such as a liquid crystal display, and a bus 7 for connecting those components with each other. The GPU 3 includes a plurality of blocks each constructed of computing units referred to as cores (not shown). The cores in the same block can share data at high speed. On the other hand, sharing data between a core in one block and a core in another block must be realized via the GPU memory 4, which is a global memory, resulting in relatively low speed. Moreover, the GPU memory 4 has a variable data storage area, and is directly accessed by the GPU 3 for reading and writing data. Although the CPU 1 cannot directly access the GPU memory 4, the CPU 1 can access the GPU memory 4 via the GPU 3.
The GPU has a larger number of cores compared with the CPU. Therefore, the GPU has higher computing performance than the CPU. On the other hand, the respective cores in the GPU can carry out only the same arithmetic operation at the same time, and have a stricter restriction imposed on parallelism compared with the CPU. Accordingly, problems for which the GPU can perform well are thus limited. Moreover, the architecture of the GPU is greatly different from that of the CPU, and the direct application of a program code intended for the CPU cannot sufficiently utilize the performance of the GPU. It is thus necessary to rewrite the program code intended for the CPU according to an algorithm and a processing sequence suited to the GPU.
The Monte Carlo method used for analyzing probabilistic phenomena and the like is one of the fields in which the GPU best exhibits its performance. A large amount of random numbers are used for executing the Monte Carlo method, and the quality of those random numbers affects execution results of the Monte Carlo method. In view of the above, a random number generator which can generate high-quality random number sequences at high speed is thus necessary. Moreover, while random numbers generated by a random number generator are usually uniform random numbers, random numbers used for the Monte Carlo method need to follow a distribution according to specific needs, hence the uniform random numbers generated by the random number generator must be converted into random numbers following a proper distribution.
The Mersenne Twister random number generator, which generates high-quality random numbers at high speed, was developed and published in the period from 1996 to 1998 (refer to M. Matsumoto and T. Nishimura, “Mersenne Twister: A 623-dimensionally equidistributed uniform pseudorandom number generator”, ACM Trans. on Modeling and Computer Simulation, January, 1998, Vol. 8, No. 1, pp. 3-30). A typical example thereof is a standard code in the 2002 version written in the C language for the CPU (hereinafter, referred to as mt19937ar). In addition, the Mersenne Twister for Graphic Processors (MTGP) written for the GPU was published in 2009. Refer to M. Matsumoto, “Mersenne Twister for Graphic Processors (MTGP): a new variant of Mersenne Twister”, on the Internet at URL: http://www.math.sci.hiroshima-u.ac.jp/˜m-mat/MT/MTGP/index-jp.html for more information. However, this MTGP is different in processing steps from mt19937ar for the CPU. As a result, a random number sequence generated according to the MTGP is usually different from a random number sequence generated by the mt19937ar.
In the MTGP described in “Mersenne Twister for Graphic Processors (MTGP): a new variant of Mersenne Twister”, a plurality of Mersenne Twister random number generators with different parameters from each other are provided, and different random number generators are respectively executed in different blocks of the GPU. In other words, one of the blocks corresponds to one of the random number generators. Moreover, the MTGP utilizes a high speed characteristic of a communication in the block to parallelize the random number generator in the block.
FIG. 5 schematically illustrates a configuration of the GPU 3 when random numbers are generated according to the MTGP. As illustrated in FIG. 5, the GPU 3 includes k blocks, namely first to k-th blocks. Then, each block includes a plurality of cores (not shown). The plurality of cores are responsible for execution of first to n-th threads. The thread herein means a flow of processing in a process. It should be noted that one core may be responsible for one thread, or two or more threads. Random number generators MT1 to MTk are respectively associated with the first to k-th blocks in this way. In other words, the first to k-th blocks respectively correspond to the random number generators MT1 to MTk. Processing of each of the random number generators is executed in parallel by n threads in each of the blocks.
FIG. 6 is a flowchart of random number generation processing carried out by each of the random number generators MT1 to MTk. Not only the random number generation processing in the Mersenne Twister method but also the random number generation processing in many other methods includes update processing of updating vectors referred to as state vectors, tempering processing of converting the updated state vectors into integer random numbers having favorable properties, and conversion processing of converting the integer random numbers into random numbers having another distribution. First, in Step S51, when the random number generation processing starts, the GPU 3 makes a preparation such as reading a program for the random number generation processing from the GPU memory 4. In Step S52, the value of a variable M is assigned to a variable N. This variable M along with a variable L, which is described later, is determined by a form of the execution of the Mersenne Twister method. Then, the processing proceeds to Step S53, and the update processing of the state vectors is carried out. Specifically, state bits Ri (i is a natural number) are updated using recurrence equations described below.
                                          R            N                    =                      F            ⁡                          (                                                R                                      N                    -                    M                                                  ,                                  R                                      N                    -                    M                    +                    1                                                  ,                                  R                                      N                    -                    L                                                              )                                      ⁢                                  ⁢                              R                          N              +              1                                =                      F            ⁡                          (                                                R                                      N                    -                    M                    +                    1                                                  ,                                  R                                      N                    -                    M                    +                    2                                                  ,                                  R                                      N                    -                    L                    +                    1                                                              )                                      ⁢                                  ⁢        ⋮        ⁢                                  ⁢                              R                          N              +              n              -              1                                =                      F            ⁡                          (                                                R                                      N                    -                    M                    +                    n                    -                    1                                                  ,                                  R                                      N                    -                    M                    +                    n                                                  ,                                  R                                      N                    -                    L                    +                    n                    -                    1                                                              )                                                          Equation        ⁢                                  ⁢        1            where the state bit Ri forms a variable in K bits, F is a function including bitwise operations and the four arithmetic operations, and L and M are positive integers satisfying a relationship L<M. Those L and M are determined by the form of the execution of the Mersenne Twister method.
In Step S54, the tempering processing is carried out. Specifically, the state bits Ri obtained in Step S53 are converted into integer values Si according to the following equations.
                                          S            N                    =                      G            ⁡                          (                              R                N                            )                                      ⁢                                  ⁢                              S                          N              +              1                                =                      G            ⁡                          (                              R                                  N                  +                  1                                            )                                      ⁢                                  ⁢        ⋮        ⁢                                  ⁢                              S                          N              +              n              -              1                                =                      G            ⁡                          (                              R                                  N                  +                  n                  -                  1                                            )                                                          Equation        ⁢                                  ⁢        2            where G is a function including bitwise operations.
Then, Step S55, in which the conversion processing is carried out, is not an essential step, and is carried out according to necessity. Specifically, the integer values Si obtained in Step S54 are converted into uniform random numbers Ti according to the following equations, and the uniform random numbers Ti are stored in the GPU memory 4.
                                          Mem            ⁡                          (              N              )                                =                      H            ⁡                          (                              S                N                            )                                      ⁢                                  ⁢                              Mem            ⁡                          (                              N                +                1                            )                                =                      H            ⁡                          (                              S                                  N                  +                  1                                            )                                      ⁢                                  ⁢        ⋮        ⁢                                  ⁢                              Mem            ⁡                          (                              N                +                n                -                1                            )                                =                      H            ⁡                          (                              S                                  N                  +                  n                  -                  1                                            )                                                          Equation        ⁢                                  ⁢        3            where H is a function for the conversion into the uniform random number. A plurality of inputs to the function may be necessary. Moreover, Mem represents the operation of storing in the GPU memory 4. Thus, Step S55 is a step carried out when it is necessary to convert the integer values Si obtained in Step S54 into the uniform random numbers Ti.
In Step S56, each of the blocks determines whether a required number of random numbers have been generated. If the required number of random numbers have not been generated, the processing proceeds to Step S57, and the sum of the number n of parallel executions and the variable N is assigned to the variable N. Then, the processing of Steps S53 to S55 is carried out. If the required number of random numbers have been generated in this way, the random number generation processing is ended in Step S58.
Each of the random number generators MT1 to MTk carries out Steps S51 to S58 as described above. Moreover, the processing in Steps S53 to S57 is carried out in parallel by the n threads as illustrated in FIG. 6.
NVIDIA Corporation in the U.S., which is a developer of GPUs, disclosed Mersenne Twister random number generators as a sample of the parallelization of random number generators (refer to NVIDIA Corporation in the U.S., “CUDA ZONE”, on the Internet at http://www.nvidia.co.jp/object/cuda_get_samples_jp.html). In a form according to NVIDIA Corporation in the U.S., one random number generator is assigned to each thread, thereby causing the random number generators to generate random numbers in parallel. FIG. 7 schematically illustrates a configuration of a GPU when the random numbers are generated according to the form of NVIDIA Corporation in the U.S. As illustrated in FIG. 7, first to m-th threads in a first block are respectively responsible for first to m-th random number generators MT1 to MTm. First to m-th threads in a second block are respectively responsible for (m+1)th to 2m-th random number generators MTm+1 to MT2m. Similarly, first to m-th threads in the k-th block are respectively responsible for ((k−1)m+1)th to km-th random number generators MT(k−1)m+1 to MTkm. Thus, one random number generator is assigned to one thread. In other words, one thread corresponds to one random number generator. The processing in each thread is the same as the ordinary Mersenne Twister algorithm.
The existing MTGP achieves the parallelism by running the plurality of random number generators in parallel as illustrated in FIG. 5. However, the existing MTGP generates only a random number sequence in which random numbers obtained from the plurality of random number generators are mixed, and thus generates a random number sequence substantially different from a random number sequence obtained by a single random number generator. In other words, mt19937ar, which is serial processing, and the MTGP, which is parallel processing, are different from each other in obtained random number sequences, and reproducibility of a result of executing the Monte Carlo method is lost as a result. This reproducibility is indispensable when carrying out a backtesting or examining influence exerted when computation conditions are changed in fields of the financial engineering and the like. If a single random number generator is employed, the reproducibility in the MTGP can be retained. However, this means that only one block is used in the GPU, and computing resources in the other blocks are thus wasted. This holds true for the form of the NVIDIA Corporation in the U.S. illustrated in FIG. 7. Moreover, there is a problem in that the execution speed of each of the random number generators is not fast.