1. Technical Field
The present invention relates to techniques for distributed processing of tasks of a job, and in detail, relates to a method for autonomously dividing and processing tasks when a job is executed, a system for implementing the method, and a computer program for implementing the method via a computer.
2. Background Art
In distributed processing such as grid computing, the allocation of jobs (tasks) to computer resources is an important determinant of performance (for example, completion time) of the jobs, the capability utilization of the resources, and the like. A method is disclosed in B. Urgaonkar, P. Shenoy and T. Roscoe, “Resource Overbooking and Application Profiling in Shared Hosting Platforms”, in Proceedings of the Fifth Symposium on Operating Systems Design and Implementation (OSDI), December 2002, in which the operating characteristics of individual jobs are observed in advance, profiles represented by some parameters (an average CPU usage, a burst rate of CPU usage, and the like) are created, and the profiles are used for allocation planning. The plan is carried out by controlling the QoS (Quality of Service) of resources using a special driver and the like during job execution. However, critical problems with this method are still open, for example, a first-time job with which no profile exists cannot be handled, it is difficult to accurately forecast the status in which a plurality of jobs share resources even when only a small number of parameters of a profile are used, and a special process is necessary to control the QoS of resources.
On the other hand, a method for dynamically relocating a running job to another resource environment is disclosed in Paul Ruth, Junghwan Rhee, Dongyan Xu, Rick Kennell, and Sebastien Goasguen, “Autonomic Live Adaptation of Virtual Computational Environments in a Multi-Domain Infrastructure”, International Conference on Autonomic Computing (ICAC) 2006. Since, without modifying a job in the application layer at all, the whole job, which is divided into tasks, is transferred to another resource environment, this method can be applied to an application in which an execution schedule is dynamically changed in response to the status of a running job. However, since a whole job is transferred as a unit without changing the structure of tasks, it is assumed that a resource environment that is highly suitable as a transfer destination is available. Thus, in the case of a resource environment that operates at a high usage ratio, transfer destination assignment may be difficult. Moreover, the quality of job execution after the transfer is not guaranteed.
Moreover, a dynamic load distribution parallel computer system is disclosed in Japanese Unexamined Patent Application Publication No. 9-160884, in which a table that periodically stores the task loads of individual nodes is provided in a commonly accessible shared memory of the individual nodes, and when it is determined that tasks generated in one node cannot be processed within the time limit, the one node locates another node, the task load of which is lower than that of the one node, referring to the shared memory, and requests the other node to process the tasks, which the one node cannot process, instead of the one node. When the number of tasks to be processed increases in each of the nodes, the increased number of tasks can be dynamically handled. However, it is necessary to always check the task loads of all the nodes and store data of the task loads in the shared memory of the individual nodes, and it is not guaranteed that, when one node refers to the shared memory, the one node always locates another node, the task load of which is lower than that of the one node, so that it may be the case that the one node cannot request another node to process tasks instead of the one node.
Moreover, a job scheduling management method is disclosed in Japanese Unexamined Patent Application Publication No. 2005-31771, the operational status of computers to which jobs are assigned is monitored by a management server, and when the operational status does not satisfy prescribed conditions, for example, job completion time, uncompleted jobs are detected, and, on the basis of information of resources necessary to execute the uncompleted jobs, another computer that can execute the uncompleted jobs is extracted to assign the uncompleted jobs to the other computer. Although job scheduling can be performed in a manner that depends on the operational status of resources, information on a plurality of computers that execute jobs, such as the content of job execution, resources necessary for job execution, and available resources, must be always centrally managed by the management server, and job scheduling must be performed only by the management server.