1. Field of the Invention
The present invention relates to unpacking of data in a data stream management system, and more particularly, to a method for optimizing field unpacking.
2. Brief Description of the Related Art
The present invention relates to obtaining information from a data structure. A data structure includes data streams, databases and any other type of data structure. A stream in a data stream management system includes a collection of packets and a table in a database that includes a collection of records. Each packet or record includes a collection of named fields. For ease of understanding and reading, this application will refer to streams and packets, but such references of streams and packets are not limited to a data stream management system and relate to other types of structures, such as a database.
A task related to the present invention is evaluation of a collection of queries on a stream of data. Each query in the collection of queries references one or more fields from the collection of named fields. The evaluation of each query requires obtaining information contained in the various fields of a data stream, referred to as unpacking the data.
For example, if a query Q1 is “select a and b from S where c=3”, then Q1 references the fields of packets of stream S named a, b, and c. Let us suppose further that there is another query, Q2, in the collection of queries to be evaluated over S. Suppose that Q2 is “select c and d where e<7”. Then Q2 references the collection of fields c, d, and e. For this example the collection of fields referenced by any query in the collection of queries to be evaluated in S are a, b, c, d, and e. All of the queries in the collection of queries will be evaluated over S at the same time. This will be done one stream at a time. The procedure is presumably to retrieve the next packet of the stream, and then have each query in the collection of queries process the packet.
In general, two evaluations are commonly used to unpack packets of data from a data stream management system, lazy unpacking and eager unpacking. In eager unpacking, after retrieving the next packet from the stream, we prepare it by unpacking the collection of all fields referenced by any query in the collection of queries that we are evaluating over the stream. In the example above, the collection of queries are Q1 and Q2, and the collection of fields are a, b, c, d, and e.
Note that the packets in the stream might have additional fields, for example, f, g, and h. Therefore, in eager unpacking, the procedure is to iterate over all packets in the stream. Upon getting the next packet, fields a, b, c, d, and e are first unpacked, then Q1 is evaluated using the unpacked values of a, b, and c. After that, Q2 is evaluated using the unpacked values c, d, and e.
In lazy unpacking, the unpacking of fields is deferred as late as possible. That means the unpacking occurs on an on-demand basis. The evaluation procedure consists of (1) iterating over the stream, (2) getting the next packet, (3) evaluating Q1 on the packet received, and (4) evaluating Q2 on the packet received. In the example above, Q1 is evaluated as follows: (1) unpack c; (2) if c=3, then first unpack a and second unpack b; and (3) output the packet (a, b). The benefit of lazy unpacking is that if c=4, we do not waste time unpacking fields a and b.
The following is an additional example of lazy unpacking. Suppose there is a packet such that c=3 and e<7. In Q1, a, b, and c will be unpacked. For Q2, the program requires: (1) e to be unpacked; (2) if e<7, then first unpack c and second unpack d; and (3) output the packet (c, d). In this example, if Q1 has already unpacked c, then it wastes time for Q2 to also unpack c. Therefore, in lazy unpacking, we modify our unpacking procedures to test if a field has already been unpacked by another query, and if so use the already unpacked value.
In view of the foregoing, it would be advantageous to provide a method for obtaining specific data, whereby the data is obtained quickly and only requires the necessary fields in the packets to be unpacked.