It is quite common to find cases where one has to store and then access a list of variable length structures. The accesses of the elements could be in forward and reverse directions. An example where this is needed is in compressed database indexes.
A database index stores an efficient mapping between a key to a list of row identifiers (RIDs). In compressed indexes, these lists of RIDs are often encoded using variable length encoding schemes. Here, the length of the RID is a function of its value. The value could be its original RID value or that obtained by a delta encoding of the original RID value and/or other compression scheme applied to it. As is known, delta encoding is a technique for storing or transmitting data in the form of differences between sequential data values, rather than the complete set of data values. The differences are referred to as “delta encoded integers” or, more simply, “deltas.” Delta encoded lists of RIDs may then be further compressed using a plurality of other compression methods. One exemplary method is dictionary-based compression, where common bit patterns in the deltas are replaced with a short codeword.
While accessing data via these compressed indexes, one needs to be able to traverse the keys and RID lists in forward and reverse orders. Consider an index on a table column called ship_date. One can expect a lot of items were shipped on the same date and they will have the same ship_date. In such a case, the index would have the key (e.g., ship_date=01/01/2008) followed by a list containing the record identifiers of those records which have this value. Immediately following would be another key and its RID lists in increasing logical order of the keys.
A user might want to know how many items were shipped for each date of 2008 in increasing or decreasing order of the dates. The former (i.e., increasing order) could be easily answered by a traversal of the index in the forward direction while picking up the count of records for each key in that range. For the latter (i.e., decreasing order), one could traverse the index in reverse direction and answer the query. Thus, the ability to traverse the key and RID lists in forward and reverse direction is very important for query processing.
Apart from the variable length record identifiers (RIDs), the list could also contain information describing the state of the record. This could be in data structures which may be of fixed length and are called RIDFlags for this discussion. The RIDFlags could follow each RID in the list.
The conventional form of variable length encoding breaks the structure into bytes (or blocks) and uses a bit in every byte (or block) to indicate if the byte (or block) is the final byte (or block) or a continuation byte (or block). While reading the data, this bit is used to put bytes (or blocks) together and form the complete variable length data item. This is easy to do when one is traversing the list in a forward direction but becomes difficult to do in a reverse direction. It becomes impossible to do that when the variable length items (such as RIDs) are intermixed with other fixed length items (such as RIDFlags). The reverse scan would not be able to distinguish between a RIDFlag from an encoded byte (or block) of the RID.