Prior approaches to sharing computation for multiple aggregation queries over data streams have used common sub-expression analysis.
A need exists for the identification and maintenance of additional phantoms, especially for the Gigascope architecture. A need exists for a principled approach to the optimized evaluation of multiple aggregation queries, which are very common in data stream management systems. Gigascope currently evaluates multiple aggregation queries independently, with no shared computation. The key difficulty is in identifying the specific phantoms to maintain. The wrong choice in phantoms to maintain would result in additional work with no consequent benefit being gained.
Historically databases store lots of data, in collections of tables, each of which is a set of records. Using query languages such as SQL, information can be combined from multiple tables. More recently, the volume of data that we are able to collect such as IP (Internet Protocol) data, sensor data or other types of data, is so large that the data can't all be stored but one still wants to be able to compute the results of a query over the data.
As an example, consider IP data at the packet level. Each packet was sent at a particular time, from a particular source IP address, and to a particular destination IP address. One user may be interested in finding out how many packets came from a source IP during a specific time interval. The differences these queries are in which combinations of fields they want the information reported on, such as source IP, destination IP, and the like.
The present embodiments meet these needs.
The present embodiments are detailed below with reference to the listed Figures.