1. Field of the Invention
The present invention is generally directed to an improved data processing system. In particular, the present invention is directed to an improved grid computing system in which grid jobs are scheduled in accordance with multiple dimensions of dynamic load factors.
2. Description of the Related Art
In the 1990's, computer scientists began exploring the design and development of a computer infrastructure, referred to as the computation grid, whose design was based on the electrical power grids that had been known to date. Grid computing was initially designed for use with large-scale, resource intensive scientific applications, such as the Search for Extraterrestrial Intelligence (SETI) program's computing grid, that require more resources than a small number of computing devices can provide in a single administrative domain. Since then, grid computing has become more prevalent as it has increased in popularity as a mechanism for handling computing tasks.
A computation grid enables computer resources from geographically distributed computing devices to be shared and aggregated in order to solve large-scale resource intensive problems. A computational grid may also be referred to as just a “grid.” To build a grid, both low level and high level services are needed. The grid's low level services include security, information, directory, and resource management services. The high level services include tools for application development, resource management, resource scheduling, and the like. Among these services, the resource management and scheduling tends to be the most challenging to perform optimally.
Known grid computing systems, such as LEGION, DATA SYNAPSE, PLATFORM COMPUTING, GRID MP™ from UNITED DEVICES, BERKLEY OPEN INFRASTRUCTURE FOR NETWORK COMPUTING (BOINC), PBS PRO™ Grid from ALTAIR, the GLOBUS® TOOLKIT (available from Argonne National Laboratory, Chicago, Ill.), and the OPEN GRID SERVICES ARCHITECTURE (OGSA), perform resource management and scheduling based primarily upon the processor load(s) of the various nodes, i.e. computing devices, in the computing grid with some other non-dynamic prerequisite factors being taken into account to determine which nodes may be utilized in the computing grid. Thus, if a node meets all of the non-dynamic prerequisite factors and its current processor load is below a predetermined threshold, grid jobs may be scheduled to run on that node. If the node's processor load is above the predetermined threshold, the node is no longer a candidate to run grid jobs until its processor load again falls below the predetermined threshold.
Because known grid computing systems only take into consideration processor load(s) as a dynamic factor for determining scheduling of jobs, and fail to consider network traffic that the grid jobs may create, sub-optimal scheduling often results. As a result, the grid jobs, which are intended to be performed in an unobtrusive manner with regard to the regular functioning of the nodes, may adversely affect the existing loads on the nodes.
Because of this sub-optimal scheduling that results due to using only the processor load(s) as a basis for the scheduling, many scientific and commercial enterprises are reluctant to make use of grid computing because of the possible negative impact it may cause on their existing information technology infrastructures. First, these enterprises are uncertain about how much grid activity may disrupt their existing workload and second, they are hesitant to use the computing grid for mission critical projects because they are unable to quantify the capacity of their grid that is necessary to run the grid jobs associated with the grid project within a required time span.
These problems with existing grid computing systems are rooted in the fact that resource management and scheduling in these grid computing systems do not take into account the necessary amount of network traffic for performing grid jobs or the affect that this traffic may have on existing loads of nodes in the grid. Network traffic may negatively affect both the performance of the existing workloads on the nodes in a grid as well as the performance of the grid jobs themselves.