The present invention relates to debugging, and more specifically, to identifying minimal operator subsets in a distributed streams application for debugging purposes.
In a streams processing environment, multiple nodes in a computing cluster execute a distributed application. The distributed application retrieves a stream of input data from a variety of data sources and analyzes the stream. A stream is composed of data units called “tuples,” which is a list of values. Further, the distributed application includes processing elements that are distributed across the cluster nodes. Each processing element includes one or more operators configured to perform a specified task associated with a tuple. Each processing element receives one or more tuples as input and processes the tuples through the operators. Once performed, the processing element may output one or more resulting tuples to another processing element, which in turn performs a specified task on those tuples, and so on.
A developer may design an operator graph using an integrated development environment (IDE) tool. The operator graph specifies a desired configuration of processing elements in the streams processing environment. Using the operator graph, the developer may define functions for each processing element to perform. The functions can specify a given task to perform and a destination processing element for tuple output. Further, the IDE tool may provide a debugger that allows the developer to ensure that the distributed application executes in the streams processing environment as specified.