One of the techniques for enhancing efficiency of information processing by computers is distributed processing. Distributed processing is a parallel processing technique in which jobs are executed by a plurality of machines. The techniques for sharing processing among a plurality of machines include, in particular, those referred to as a “multiplexing system” and “grid computing”.
Among them, the multiplexing system is a technique intended to protect data that has already been processed, or continue service being provided, at a time of trouble, such as machine failure. Therefore, in the case of the multiplexing system, a plurality of machines execute the same processing.
On the other hand, grid computing is typically a technique in which a plurality of computers and memory media are connected together via a network, and virtually handled as a large-scale, high-performance computer. For example, by allowing a plurality of computers to share and execute arithmetic processing that requires a large amount of calculation, it becomes possible to substantially quickly obtain arithmetic results.
Incidentally, to share arithmetic processing among a plurality of computers, a function of assigning jobs to machines is necessary. Such a function is implemented by a tool generally referred to as a “load balancer” or “load-sharing (load-distribution) software”.
Next, general usage of the load balancer in grid computing will be described. FIG. 7 is a schematic configuration diagram of a conventional information processing system constructed based on the grid computing technique. As shown in FIG. 7, the information processing system includes a client 110, a master node 120, and four nodes 130a, 130b, 130c, and 130d. These machines 110, 120, 130a, 130b, 130c, and 130d are connected together via a network. The client 110 is the requester of a large-scale arithmetic operation. For example, it is assumed that the client 110 requests the master node 120 to carry out a large-scale arithmetic operation consisting of 1,000 jobs. Here, the “job” is a unit of calculation. For example, the job is to obtain “z” by the calculation “z=x+y”. In this case, it is assumed that n sets of variables are substituted for x and y. Specifically, in a single job, the number of times calculation is carried out depends on the number of variables to be substituted. Accordingly, although the number of jobs is 1,000, the total number of calculations is dozens or hundreds of times greater than that number.
Here, the large-scale arithmetic operation, which the client 110 requests the master node 120 to carry out, is memorized in a predetermined memory device or the like. Data for the large-scale arithmetic operation is originally transmitted from another computer connected to the client 110 via a network, or inputted by a person in charge using an input device of the client 110. At this time, for example, the data for the large-scale arithmetic operation may be transmitted or inputted after being divided into a predetermined number (e.g., 1,000) of jobs, or transmitted or inputted without being divided into the predetermined number of jobs. In the latter case, the client 110 divides the transmitted or inputted data for the large-scale arithmetic operation into the predetermined number of jobs in accordance with predetermined rules. Hereinafter, unless otherwise specified, descriptions will be given with respect to the case where the data for the large-scale arithmetic operation is transmitted or inputted to the client after being divided into 1,000 jobs.
The master node 120 is a computer responsible for the load balancing function, and performs the process of assigning jobs to the nodes 130a, 130b, 130c, and 130d. As for the job assignment, some techniques have been proposed, in which a suitable number and size of jobs are transmitted to each node in accordance with, for example, performance and load status of the node (see, for example, patent literature 1 and patent literature 2). In addition, the nodes 130a, 130b, 130c, and 130d are computers for carrying out arithmetic processing of the jobs assigned by the master node 120.
The client 110 first receives the data for the large-scale arithmetic operation consisting of 1,000 jobs, and then transmits the 1,000 jobs to the master node 120. Next, the master node 120 assigns the received 1,000 jobs to the nodes 130a, 130b, 130c, and 130d. The nodes 130a, 130b, 130c, and 130d carry out arithmetic processing of the jobs transmitted from the master node 120, and upon completion of the processing, they report to the master node 120 that the jobs have been completed. Upon receipt of such a report from any node, if there is any unprocessed job that has not yet been assigned, the master node 120 transmits the job to that node. As such, the master node 120 repeats both the process of transmitting any unprocessed job to each node, and the process of receiving a report that the job has been completed, thereby causing the four nodes 130a, 130b, 130c, and 130d to execute arithmetic processing of all the jobs.
In addition, the master node 120 simply implements the function of efficiently assigning the jobs requested by the client 110 to the nodes 130a, 130b, 130c, 130d, and it does not perform any data processing on results of arithmetic processing by the nodes 130a, 130b, 130c, and 130d. Basically, there is a premise that the results of arithmetic processing by the nodes 130a, 130b, 130c, and 130d ultimately aggregate in the client 110, which is the requester of the large-scale arithmetic operation. Therefore, for example, when a result of arithmetic processing is returned from any of the nodes 130a, 130b, 130c, and 130d, the master node 120 is required to transmit the arithmetic processing result to the client 110.    Patent Literature 1: Japanese Unexamined Patent Application Publication No. H07-219907    Patent Literature 2: Japanese Unexamined Patent Application Publication No. 2002-269062