The present invention generally relates to workflow processing and, more particularly to optimizing workflow execution by dispatching tasks thereof to nodes of a grid computing system.
Two emerging technologies that allow efficient use of computing resources, for example, within a collaborative environment are workflow processing and grid computing. Workflow processing technologies typically provide application integration capabilities, enabling a series of separate software components to be run in a defined sequence facilitating construction of larger solutions from individual software components. Workflow implementations are typically driven from a workflow definition that identifies the sequence of tasks to perform and data flows from one task to another. Some workflow environments may be configured to optimize overall workflow execution, for example, by running certain tasks in parallel, typically as multiple processing threads on the same computing node running the overall workflow application.
Grid computing is an architecture and corresponding infrastructure that is based on the concept of a pool of compute resources that can be applied dynamically to service requests, from various entities, for those resources. The pooled resources may include specialized processing resources contained on high-end servers or relatively normal processing resources contained on relatively low-end individual workstations. In any case, by pooling the resources together, requesting entities with access to the grid are presented, in effect, with one large virtual computing resource they may utilize to run tasks.
A typical grid infrastructure involves a task dispatch component that identifies a compute resource from the pool of resources to service the next task to be performed. Current dispatch algorithms typically focus on matching a task to a node based on either resource requirements of the task or available runtime resources of the node. As an example, if a task can only run under the Linux® operating system (Linux is a trademark of Linus Torvalds in the United States, other countries, or both), the task thus must be dispatched to a node running the Linux® operating system. As another example, current workloads of individual nodes in the compute resource pool may be monitored and tasks may be assigned to nodes that are less busy. The typical grid computing solution has the task dispatch component working off a queue of tasks to be performed, selecting a task to be performed, dispatching the selected task to a node in the compute resource pool and then processing the next task in the queue.
It may be possible to utilize a grid computing infrastructure for execution of workflow oriented solutions, for example, by integrating a workflow processing engine and grid task dispatcher. Using the typical dispatch algorithm described above, as tasks were ready to execute within the workflow, they would be submitted to the grid task dispatcher and routed to the next available node for execution. While this model may enable the use of a grid computing environment for workflow execution, it may lead to sub-optimal workflow processing for failure to take into account a number of considerations, such as the variety of compute needs of different tasks within a given workflow, the difference in compute resources of various nodes in the grid environment, and possible synergistic relationships or contention that may exist between sequential or parallel tasks in the workflow.
Accordingly, there is a need for techniques to optimize workflow execution within a grid computing infrastructure.