A large number of continual range queries can be issued against a rapid data stream in order to monitor various activities and conditions. For example, in a financial stream application, various continual range queries can be created to monitor the prices and volumes of stocks and bonds. In a sensor network application, continual range queries can be used to monitor the temperatures, humidity, flow of traffics and many other readings.
Note that because these monitoring queries are evaluated repeatedly and continually against the incoming data stream, they are called continual queries. They are in contrast to regular queries that are usually evaluated only once.
As the data stream flows in an increasingly rapid rate, the processing of continual range queries becomes more difficult, if not impossible, because the processing power of the central processing unit (CPU) of the computing system doing the monitoring quickly becomes limited. Data items may have to be dropped without processing. Namely, some of the workload is shed. However, it is more desirable that a system process as many continual queries as possible against a stream that may be rapid. Hence, it is important that only the potentially relevant queries are evaluated against each data item in the stream.
One approach to quickly identifying relevant queries for processing is to use a query index. Each data point in an incoming stream is used to search the query index to find the range queries containing the data point. This is referred to as the stabbing query problem, i.e., finding the range queries that are stabbed by a data point. Though maybe conceptually simple, it is quite challenging to design an effective two-dimensional range query index in a stream environment, especially if the stream flows rapidly. The range query index is preferably main-memory-based and it must have two important properties: low storage cost and fast search time. Low storage cost is important so that the entire query index can be loaded into main memory. As a result, potential performance degradation due to paging can be avoided during index search operations. Fast search time is critical so that the system can handle a rapid stream.
Range queries are generally difficult to index. Though existing spatial indexes, such as R-trees (see, e.g., A. Guttman, “R-trees: A Dynamic Index Structure for Spatial Searching,” Proceedings of ACM SIGMOD International Conference on Management of Data, 1984, the disclosure of which is incorporated by reference herein), can be used to index range queries, most of them are disk-based approaches. Hence, they are generally not suitable for a stream environment where a main memory-based approach is preferable for fast search performance.
A main memory-based query index, called VCR-based query index, has recently been proposed for fast event matching, see the U.S. patent application identified by Ser. No. 10/671,938, filed on Sep. 29, 2003, and entitled “System and Method for Monitoring Events Against Continual Range Queries,” the disclosure of which is incorporated by reference herein. A set of predefined virtual construct rectangles, or VCRs, are used to indirectly pre-compute search results. Range queries are first decomposed into one or more VCRs. Each VCR has a unique identifier (ID) and an associated query ID list storing the IDs of queries that use it in their decompositions. A search is conducted indirectly via the VCRs by identifying the covering VCRs for a given data point. Even though it is a main memory-based approach, such VCR-based query index was not specifically designed for stream processing. The number of VCRs covering a data point can be rather high, degrading search performance.
VCR-based query index belongs to a class of main-memory index based on predefined virtual constructs (VC). VCs are used to decompose a range query. Each VC is associated with a query ID list, storing the queries covering that VC. For each incoming data point, a search is conducted by computing the VCs that cover said data point.
Existing VC-based query indexes can be divided into two categories based on the VC size: fixed-sized and variable-sized. The VCR-based approach is variable-sized, but the number of covering VCs can be large and it is not adaptive. There are two fixed-sized approaches. One uses unit-sized grid cells and the other uses grid cells of size L×L, where L>1, see “Efficient Evaluation of Continuous Range Queries on Moving Objects,” Proceedings of International Conference on Database and Expert Systems Applications, 2002, the disclosure of which is incorporated by reference herein. The unit-sized grid cells are problematic since the number of VCs needed to decompose a query can be high, resulting in high storage cost. The grid cells of size L×L, where L>1, are problematic because a range query can partially intersect with a grid cell, causing ambiguity on whether or not a range query really covers a data point. Moreover, the grid cell approach is not adaptive to changes in the distributions of query sizes and query positions.
Hence, a need is recognized to have a new and more effective main memory-based two-dimensional range query index for efficient stream processing.
Furthermore, with the advances in mobile computing and location-sensing technologies, location-aware services and applications have become possible. Such applications can be used to deliver relevant, timely and engaging content and information to targeted customers. For example, a retail store in a shopping mall can send timely electronic coupons (e-coupons) to the personal digital assistants (PDAs) or cell-phones of potential customers who are close to the store.
To provide location-aware services and applications, one must first know where moving objects are currently located. A set of continual range queries, each defining the geographical regions of interest, can be repeatedly re-evaluated to locate moving objects. For example, we can place a square or a circle around the location of a hotel, an apartment building, or a subway exit. By periodically re-evaluating a continual query defined by the square or circle, we can locate the moving objects that are currently located within the square or circle.
It is thus evident that efficient processing of a set of continual range queries over moving objects is critically important for providing location-aware services and applications.
Query indexing has been used to speed up the processing of continual static range queries over moving objects. By “static” here, it is meant that the regions of continual range queries remain stationary. With query indexing, periodically, each object position is used to search the query index to find all the range queries that contain the object. Once the containing range queries are identified, the object identifier (ID) is inserted into the results associated with the identified queries. After every object position is searched against the query index, the most up-to-date results for all the continual range queries are available.
With query indexing, it is paramount that the time taken to perform periodic query evaluation be as brief as possible.
In the U.S. patent application identified by Ser. No. 10/671,932, filed on Sep. 29, 2003, and entitled “Method and Structure for Monitoring Moving Objects,” the disclosure of which is incorporated by reference herein, a shingle-based query indexing approach was disclosed for processing of continual range queries over moving objects. A shingle may be defined as a digital representation of a tile-like object laid to cover a digital representation of an area (e.g., a geographical area), without necessarily being laid in overlapping rows. Shingles are predefined virtual construct rectangles (VCR). They are used to decompose query regions and to store indirectly pre-computed search results. However, shingles defined in such an approach may be redundant, slowing down each index search operation and the query processing time.
Hence, a need is recognized to have new and more effective techniques for processing of continual static range queries over moving objects for providing location-aware services and applications.