1. Technical Field
This invention relates to a method and system for dynamically scheduling programs for execution on one or more nodes.
2. Description of the Prior Art
A directed acyclic graph (DAG) includes a set of nodes connected by a set of edges. Each node represents a task, and the weight of the node is the execution time of the task. Each edge represents a message transferred from one node to another node, with its weight being the transmission time of the message. Scheduling programs for execution onto processors is a crucial component of a parallel processing system. There are generally two categories of prior art scheduler using DAGs: centralized and decentralized (not shown). An example of a centralized scheduler (10) is shown in FIG. 1 to include a scheduler (30) and a plurality of program execution nodes (12), (14), (16), (18), and (20). The nodes (12), (14), (16), (18), and (20) communicate with each other and the scheduler (30) across a network. In the centralized scheduler (10), an execution request for a program is made to the scheduler (30) which assigns the program to one of the nodes (12), (14), (16), (18) or (20) in accordance with a state of each node. An example of a routine implemented with a centralized scheduler is a first in first out routine (FIFO) in which each program is assigned to a processor in the order in which they are placed in the queue. Problems with FIFO arise when a program in the queue is subject to dependency upon execution of another program. The FIFO routine does not support scheduling a dependent program based upon execution of a prior program. For example, two programs are provided with an execution dependency such that the first program requires a first data input and generates a second data output, and the second program is dependent upon the second data output from the first program execution, and the second program generates a third data output. If the scheduler assigning the programs to one or more processors is running a FIFO routine and the two programs are assigned to execute on two different nodes, the second data output from the first program execution will be on a different node than the second program execution. The second data output will need to be transferred from the node that executed the first program and produce the second data output to the node in which the second program has been assigned for execution. The process of transferring data between nodes consumes resources of both nodes associated with data encryption and decryption. Accordingly, the centralized scheduler results in a decreased utilization of both the first and second processors respectively executing the first and second programs.
In the decentralized scheduler, a plurality of independent schedulers is provided. The benefit associated with the decentralized scheduler is the scalability in a multinode system. However, the negative aspect of the decentralized scheduler is complexity of control and communication among the schedulers to efficient allocate resources in a sequential manner to reduce operation and transmission costs associated with transferring data across nodes for execution of dependent programs. Accordingly, there is an increased communication cost associated with a decentralized scheduler.
There is therefore a need for a method and system to efficiently assign resources based upon a plurality of execution requests for a set of programs having execution dependency with costs associated with data transfer and processing accounted for in a dynamic manner.