1. Technical Field
The present invention relates generally to scheduling non-continual work in a stream-based distributed computer system, and more particularly, to systems and methods for resource allocation to provision for non-continual jobs, for deciding which non-continual jobs to perform, for deciding when to perform these jobs, for determining how much processing to allocate to each of selected jobs, and for deciding how to choose candidate processing nodes for the processing elements in those jobs in a manner which minimizes a penalty of the non-continual work in the system, results in good network utilization, meets a variety of practical constraints, and is robust with respect to dynamic changes in the system over time.
2. Description of the Related Art
Distributed computer systems designed specifically to handle very large-scale stream processing jobs are in their infancy. Several early examples augment relational databases with streaming operations. Distributed stream processing systems are likely to become very common in the relatively near future, and are expected to be employed in highly scalable distributed computer systems to handle complex jobs involving enormous quantities of streaming data.
In particular, systems including tens of thousands of processing nodes able to concurrently support hundreds of thousands of incoming and derived streams may be employed. These systems may have storage subsystems with a capacity of multiple petabytes. Even at these sizes, streaming systems are expected to be essentially swamped at almost all times. Processors will be nearly fully utilized, and the offered load (in terms of jobs) will far exceed the prodigious processing power capabilities of the systems, and the storage subsystems will be virtually full. Such goals make the design of future systems enormously challenging.
Any stream-oriented system will have a reasonable amount of non-continual work, and this work will need to be scheduled in parallel with the continual (streaming) jobs. Examples of such non-continual jobs include, but are not limited to maintenance tasks, performance optimization tasks, and other traditional work. Focusing on the scheduling of non-continual work in such a streaming system, it is clear that an effective optimization method is needed to use the system properly.
Consider the complexity of the scheduling problem as follows. Referring to FIG. 1, a conceptual system is depicted for scheduling typical jobs. Each job 1-6 includes directed graphs 12 with nodes 14 and directed arcs 16. The nodes 14 correspond to tasks (which may be called processing elements, or PEs), interconnected by directed arcs 16 (which represent the precedence constraints among the PEs). Assume that the jobs themselves are not interconnected; this can be done without loss of generality by aggregating jobs.
Referring to FIG. 2, a typical distributed computer system 11 is shown. Processing nodes 13 (or PNs) are interconnected by a network 19. One problem includes the scheduling of non-continual work in a stream-oriented computer system in a manner which minimizes the penalty of the non-continual work performed. The problem also includes the process of deciding how many resources to allocate to non-continual work, as compared to the streaming work. The various processing elements do the work in the system, and may have arbitrarily complex precedence constraints among themselves. The system is typically overloaded and can include many processing nodes. Importance of the various work items can change frequently and dramatically. There are no known solutions to this problem.