1. Field of the Invention
The present invention relates generally to an improved data processing system for grid scheduling and more specifically, to a computer implemented method, an apparatus, and a computer program product for scalable scheduling of tasks in heterogeneous systems.
2. Description of the Related Art
Grid scheduling is an optimization process for a target function, such as minimizing the application or user's response time, or maximizing system utilization. However, achieving an optimal solution for the target function is NP-complete in general. Therefore, many different heuristics have been proposed and developed, such as Min-min Max-min Seg-min-min, and Dynamic Selection. Min-min and Max-min are the known scheduling heuristics. The two methods were first introduced by Ibarra et al., and have been widely used. Seg-min-min is a modified Min-min heuristic. The Min-min heuristic initializes a set “T”, for example “T={T1, T2 . . . Tn}” to contain all unscheduled tasks. While “T” is not empty, for each task in “T”, the method calculates the Estimated Completion Time (ECT) on each machine by fetching the task and all available machines. The Minimum Estimated Completion Time (MECT) over all machines is then calculated by selecting the best machine for the task. The method then selects a task with the overall smallest minimum estimated completion time and assigns the selected task to the machine. The selected task, “Ti”, is then deleted from the set of tasks. The remaining tasks are then updated to reflect current status. The process repeats until all tasks in the set “T” are scheduled. The logic behind the Min-min algorithm presumes that if each task is assigned to its optimal machine, the overall response time will be minimal.
The Max-min heuristic is similar to the Min-min heuristic. The difference is that after calculating the minimum estimated completion time for each task in the set, the Max-min heuristic selects a task with the overall largest minimum estimated completion time and assigns it to the slave machine. The Max-min heuristic assumes that if the tasks with the largest costs are assigned first, more tasks can be executed in parallel, which should lead to improved efficiency. The algorithm used for Min-min can be modified by replacing the selection of the overall “smallest” with a selection of the overall “largest” minimum estimated completion time.
As the Min-min algorithm schedules short tasks first, and leaves the large tasks until the end, the Min-min method tends to make the workload unbalanced. The Segmented Min-min (Seg-min-min) algorithm first sorts the set of tasks, and then partitions them into segments of equal size. The segments of larger tasks are scheduled before the segments of smaller ones. In each segment, tasks are scheduled using the Min-min algorithm. Thus, the Seg-min-min combines the ideas of Min-min and Max-min. The scheduling time refers to the one-round time to assign all available tasks to all available machines. For example, the duration commencing when the scheduler starts to schedule the first task until the scheduler finishes the last task.
When the scheduling time is longer than the execution time of a task, machines finish running tasks but cannot receive new tasks to run. This wastes processing resources and degrades system performance. Usually, tasks and machine information are stored in a database. During scheduling, the task and machine information must be loaded into memory. When the number of tasks increases, the scheduling time and the memory requirements increase as well and may exceed available memory. Therefore, the known scheduling algorithms are not scalable and thus, inappropriate for large real systems.