With the development of computer hardware and software technology, the emergence of computer clusters provides a more efficient data computing power while improving the computing performance of separate computers. Based on distributed computing technology, one or more jobs may be divided into multiple parallel executable tasks, and these tasks may be allocated to one or more processing units (e.g. processor cores) at multiple computing nodes in a distributed computing system for execution. The performance of distributed computing technology depends on, to a greater extent, how to schedule and manage these tasks. Task scheduling and management may be implemented by transmitting various types of control data among respective tasks.
So far, providers of distributed computing technology have developed kinds of basic function libraries capable of supporting distributed computing, where there are defined kinds of basic functions for scheduling and managing parallel tasks. Therefore, independent software vendors (ISVs) in each industry do not have to develop basic functionality supporting distributed computing again. Instead, independent software vendors may develop applications suitable for their industries by invoking functions in basic function libraries. For example, a software vendor in the weather forecasting field may develop applications for weather forecasting based on a basic function library, and a software vendor in the data mining field may develop applications for data analysis based on the basic function library.
Typically the complexity of existing distributed computing systems requires mutual cooperation between multiple applications so as to achieve a computing job. Sub-jobs of a large computing job have mutual dependences and are in chronological sequence. These sub-jobs jointly form a workflow, where multiple applications from one or more independent software vendors might be involved. However, a user does not have source code of these applications but only executable code; therefore, the user can only execute these applications by invoking executable code, which prevents further optimization with respect to the overall performance of respective applications.
Usually each application comprises tasks associated with task management and scheduling. For example, task Allocate may allocate various resources to multiple tasks comprised in an application while the application is running initially, and task Release may release all allocated resources at the end of application running. Suppose applications App-A and App-B are serially executed, then a phenomenon might occur as below: various resources that have been released by release task Release-A of application App-A are allocated to application App-B by allocate task Allocate-B of application App-B. Note tasks such as resource allocation and release do not directly contribute to the computation of an application but are used for assisting in the execution of the application, so the ratio of the execution time for managing and scheduling tasks to the entire application becomes an important factor affecting the operation efficiency of the application.
The time for allocating and releasing resources increases as the number of computing nodes in a distributed computing system increases. With the development of distributed computing systems, the magnitude order of computing nodes has increased from dozens to hundreds or even more, which results in the operation efficiency of jobs in distributed computing systems trends to decrease to some extent. At this point, improving the operation efficiency of distributed computing systems currently becomes a hot issue of research.