1. Technical Field
This disclosure generally relates to computer systems, and more specifically relates to the use of operator graphs in computer systems.
2. Background Art
In known database system, data must be stored, then indexed, then it can be queried or otherwise processed. However, recent advances in stream-based processing provide a new paradigm that allows data to be queried “in flight”, before the data is stored in a database file. Stream-based processing is well-suited to a distributed computing environment, which may include parallel processing on many different nodes. Because stream-based processing can require significant resources, implementing stream-based processing in a distributed computing environment allows a programmer to determine which resources are allocated to the different portions of the stream-based processing solution. This allocation of resources is static, which means the programmer manually analyzes the stream-based processing solution at programming time, then allocates resources in the distributed computing environment to different portions of the stream-based processing solution according to the programmer's estimates of what resources are needed where. In the prior art, the allocation of resources in a stream-based processing solution does not change unless the programmer decides to manually change the allocation of resources.
Operator graphs have been developed to represent the function of stream-based processing solutions. An operator graph is a set of processing elements known as operators linked together to form a larger job. In an operator graph, streams of data flow from one operator to the next. Each operator ingests a stream, and may output a stream to a subsequent operator. An operator graph thus allows operators to be fused together to form a larger processing element, akin to many procedures forming a larger program. An operator graph provides a graphical tool that helps to understand how a stream is being processed.
Because streams flow from one operator to the next in an operator graph, a slowdown by one operator can affect many operators upstream. Thus, processing streams of data can produce bottlenecks where a slowdown in one part of the operator graph can negatively impact many other parts of the operator graph. In the prior art, the only solution is for the programmer to go back to the drawing board and redesign the program at programming time to address known problems. Needless to say, taking care of one problem can cause other bottlenecks that were previously undetected to now surface. A programmer at programming time will not be able to adequately anticipate all potential problems, and cannot do anything about transient problems that occur in real-time due to changing conditions.