1. Technical Field
The present invention relates generally to scheduling work in a stream-based distributed computer system, and more particularly, to systems and methods for deciding which jobs to perform, which templates from those jobs to select, and how to choose candidate processing nodes for the processing elements in those jobs in a manner which optimizes the importance of the work in the system, results in good network utilization, meets a variety of practical constraints, and is robust with respect to dynamic changes in the system over time.
2. Description of the Related Art
Distributed computer systems designed specifically to handle very large-scale stream processing jobs are in their infancy. Several early examples augment relational databases with streaming operations. Distributed stream processing systems are likely to become very common in the relatively near future, and are expected to be employed in highly scalable distributed computer systems to handle complex jobs involving enormous quantities of streaming data.
In particular, systems including tens of thousands of processing nodes able to concurrently support hundreds of thousands of incoming and derived streams may be employed. These systems may have storage subsystems with a capacity of multiple petabytes.
Even at these sizes, streaming systems are expected to be essentially swamped at almost all times. Processors will be nearly fully utilized, and the offered load (in terms of jobs) will far exceed the prodigious processing power capabilities of the systems, and the storage subsystems will be virtually full. Such goals make the design of future systems enormously challenging.
Focusing on the scheduling of work in such a streaming system, it is clear that an effective optimization method is needed to use the system properly. Consider the complexity of the scheduling problem as follows.
Referring to FIG. 1, a conceptual system is depicted for scheduling typical jobs. Each job 1-9 includes one or more alternative directed graphs 12 with nodes 14 and directed arcs 16. For example, job 8 has two alternative implementations, called templates. The nodes correspond to tasks (which may be called processing elements, or PEs), interconnected by directed arcs (streams). The streams may be either primal (incoming) or derived (produced by the PEs). The jobs themselves may be interconnected in complex ways by means of derived streams. For example, jobs 2, 3 and 8 are connected.
Referring to FIG. 2, a typical distributed computer system 11 is shown. Processing nodes 13 (or PNs) are interconnected by a network 19.
One problem includes the scheduling of work in a stream-oriented computer system in a manner which maximizes the overall importance of the work performed. The streams serve as a transport mechanism between the various processing elements doing the work in the system. These connections can be arbitrarily complex. The system is typically overloaded and can include many processing nodes. Importance of the various work items can change frequently and dramatically. Processing elements may perform continual and other, more traditional work as well. There are no known solutions to this problem.