The present invention relates to computer technology, and more specifically, to a method and apparatus of scheduling and execution of tasks in a distributed system.
In a distributed system, a job is executed by multiple node devices. A job is generally divided into multiple tasks to be executed on various nodes in parallel. Correspondingly, in general, resources on a node available for executing tasks are logically divided into a number of identical resource units (also called as “slots”), and each free resource unit can be used to execute one task.
Commonly, resource units are divided and fixed before the operation of a distributed system. However, those divided resource units may not be suitable for various different tasks to be executed. For example, the divided resource units may be “too large” for some of the tasks, causing a part of the resource unit to remain idle during the execution of these tasks, thereby lowering the resource utilization on the note.
It has been proposed to dynamically adjust the number of divided resource units (i.e., to adjust the size of resource units) according to the state of resource utilization on the note, however lag caused by this process (e.g., lag from measuring resource utilization on the note to completing the re-division of resource units) will usually deteriorate the performance of distributed systems, which can even be unacceptable for certain jobs.