1. Field of the Invention
Exemplary embodiments of the present invention relate to scheduling work in a stream-based distributed computer system, and more particularly, to mechanisms for providing useful input to a scheduler to employ in deciding how to allocate resources between processing elements in such a system.
2. Description of Background
Traditional processing units followed a sequential execution paradigm, meaning they conceptually performed only one operation at a time. As the computing needs of the world evolved, the amount of data to be managed increased very quickly, and the sequential programming model could not cope with the increased need for processing power. In response, distributed computing developed to provide for a method of parallel processing, in which different parts of a program run simultaneously on two or more computers that are communicating with each other over a network with the objective of running a program in less time.
Distributed computer systems designed specifically to handle very large-scale stream processing jobs operate by collecting, processing, and aggregating data across large numbers of real-time streams from multiple producers to multiple consumers through the use of in-network processing operators. While distributed stream processing systems are in their infancy, they are likely to become far more common in the relatively near future because of the high efficiency they provide, and are expected to be employed in highly scalable distributed computer systems to handle complex jobs involving enormous quantities of streaming data. In particular, stream processing systems scaling to tens of thousands of processing nodes that are able to concurrently support hundreds of thousands of incoming and derived streams may be employed. Such scalability can make the design of future systems highly challenging.
Rather than being a single file or discrete chunk, a stream is a continuous data feed array, such as the stock market ticker or a television channel, for which the elements can be operated on in parallel. Given a set of input and output data (streams), stream processing systems are generally configured as a series of computer-intensive operations to be applied for each element in the stream, where local on-chip memory is reused for input and output streams to minimize external memory bandwidth. Data is gathered from memory into an input stream, operated on in the stream, and then scattered from the stream as output back into memory. Such processing systems typically consist of multiple copies of two basic types of entities: (1) the streams themselves, which may be either primal (meaning they come from outside the system) or derived (meaning that they are produced by software within the system), and (2) the processing elements or executables, which are the software entities that receive, process, and publish stream data, possibly filter or annotate the stream data, and communicate with other software entities. Implementations of such systems consist of many interconnected processing elements that take streams as input, and then output other streams (and, in some cases, other, non-stream data).
One useful method of implementing such a system is to employ processing elements that are fairly simple components individually but can be connected in appropriate ways to build complex applications. A feature of this approach is that a single processing element can be designed and implemented for reuse by many different applications and jobs, which enables a software engineer to create new applications from these reusable parts. For example, a processing element that is configured to search text for a given pattern might also be implemented within in an application for making predictions about a sector of the economy or watching for news stories on a potential pandemic.
Because the data arrives in streams in stream processing systems, the amount of resources necessary for a given processing element depends on its incoming stream rates (that is, the size of the data transferred over a communication link per unit of time), which in turn depends on the incoming stream rates of further upstream processing elements and, ultimately, oil the rate of the data entering the system. Because stream rates and availability of primal streams can vary tremendously, the assignment of a predetermined, fixed processing power goal, expressed in terms of millions of instructions per second or MIPS, to a given processing element is unlikely to produce optimal results.
A major challenge in developing distributed stream processing systems is the programming difficulty. The streams serve as a transport mechanism between the various processing elements doing the work in the system. Particularly in large-scale systems, these connections can be arbitrarily complex, and the system is typically overloaded. The importance of the various work items can change frequently and dramatically. Therefore, the scheduling of such a system so as to maximize the overall importance of the work performed is a particularly challenging task.
Scheduling refers to the manner in which processes are assigned priorities in a priority queue, and the assignments are performed by a software entity known as a scheduler. A scheduler will generally be implemented to maximize the importance of all work in the system, subject to a large number of constraints of varying importance. Therefore, a primary goal for a given scheduler is to fractionally allocate resources to the processing elements so that they can process the incoming data. A second primary goal is to balance the transition of processing between the various processing elements to prevent a producing processing element from flooding or starving a consuming processing element. A third goal for a given scheduler is to allocate the total processing resources in the system amongst the various processing elements that will be run to optimize the importance of the total processing performed. The scheduling process yields as output a schedule of the processes that represents a precedence relationship among the processes.
Exemplary schedulers for distributed stream processing systems are provided in the following commonly-assigned U.S. patent applications, the contents of each of which are incorporated herein in their entirety by reference thereto: U.S. patent application Ser. No. 11/374,192, entitled “Method and Apparatus for Scheduling Work in a Stream-Oriented Computer System”; U.S. patent application Ser. No. 11/374,643, entitled “Method and Apparatus for Assigning Candidate Processing Nodes in a Stream-Oriented Computer System”; U.S. patent application Ser. No. 11/374,399, entitled “Method and Apparatus for Assigning Fractional Processing Nodes to Work in a Stream-Oriented Computer System”; and U.S. patent application Ser. No. 11/204,726, entitled “Method and Apparatus for Assigning Candidate Nodes in a Penalty-Based Computer system.”
To produce an optimal allocation of resources in a large-scale, distributed stream processing system, it is important to provide accurate predictions of the effects of various resource allocations of the individual processing elements. This challenge can become complicated when the processing elements are generic components that may be reused in many different contexts. Accordingly, it is desirable to provide a way to predict resource usage in a distributed stream processing system that allows for flexible, effective scheduling.
A major development described within U.S. patent application Ser. No. 11/374,399, which is incorporated herein by reference above, is the ability to decouple the main input for a scheduler into well-defined atomic components by employing a so-called “resource function” for each particular stream (or, less formally, for the processing element that produces the particular stream). Resource functions can be implemented to automatically estimate the stream rates of various streams under consideration and thereby estimate the network traffic. While each exemplary resource function described in the referenced patent application is relatively simple, the iterative composition of these resource functions can allow for the scheduler to appropriately “traverse” a directed workflow graph that expresses an application in teems of many different processing elements that consume and produce streams through input and output ports respectively. By utilizing the network traffic estimates provided by the resource functions, the scheduler is able to make a determination of how to optimally “overlay” the directed workflow graphs for the processing elements onto the traffic network graph to thereby control or minimize the total amount of network traffic and prevent consuming processing elements from flooding, which can result in network congestion, latency, packet loss, and poor performance. The scheduler will thereby decide which processing elements will share nodes to provide for effective bandwidth management.
A simple approach for providing for such predictions might be to implement, for each processing element or executable, a single resource function that is directly associated with the processing element. Such an approach, however, has a number of problems. First, it requires that data be gathered on each processing element, and such data may not always be available. Additionally, different processing elements will be of different types such as, for example, a classifier, a filter, a join, etc., and may have different arguments that significantly affect its performance. Furthermore, an intelligent resource function learner for a processing element should take into account the flow specification with which the processing element is run. A flow specification describes an externally observable flow of data streaming through a processing element's ports or parameters. Such logical flows may be realized through ports and connections of different data types and a combination of data, event, and event data ports. Flow specifications represent flow sources (streams originating within a processing element), flow sinks (streams ending within a component), and flow paths (stream paths through a component from its incoming ports to its outgoing ports). Flow specifications can provide expected and actual values for flow-related properties (for example, latency).