1. Field of the Invention
This invention relates to complex data stream processing in a computer system environment. More specifically, the invention relates to application of logical operators and organization of the operators in an order to optimize performance of data processing.
2. Background of the Invention
In a data streaming environment, massive amounts of data are constantly written to the storage subsystem. Data is growing at an incredible rate with the majority being unstructured information. This data may contain complex information, such as chemical, gene, protein, bio, nano diagrams, sketches or images, all which may be contained in data streams. It is difficult for a computer system to efficiently and accurately extract and analyze structures from data streams using existing implemented techniques. It is also challenging to maintain required software using conventional techniques.
Distributed computer systems designed to handle large-scale data stream processing are evolving. For example, known techniques for handling data stream processing are only successful if the data is uniform and well formatted. However, real data is ‘noisy’ and requires extra effort to remove the noise. In one embodiment, noise refers to irrelevant or meaningless data. A noisy data stream presents a significant challenge when the data must be cleansed, corrected for errors, or corrected by interpolation for missing data. One manner of processing data employs one or more logical operators in the form of Boolean combinations of simple filters for data stream processing. The logical operator(s) process data chunks from an input stream and either pass them on to an output data stream or reject them by passing either nothing or an indicator of rejection to the output stream.
Boolean combination filtering can be used in various data intensive applications. There are two ways to compute a filter with Boolean combinations, including processing multiple components of the filter concurrently, and processing multiple components of the filter consecutively. In order to optimize the performance of a filter that is a combination of other filters, the programmer typically must guess the time that will be required to compute each component of the Boolean filters and the likelihood of each component filter passing a given chunk of data. Accordingly, there is a need to mitigate or eliminate the human guesswork associated with the process by which application order of component filters is determined.