Delta encoding is a technique for storing or transmitting data in the form of differences between sequential data values, rather than the complete set of data values. The differences are referred to as “delta encoded integers” or, more simply, “deltas.” In general, the difference between two data values is the information required to obtain one value from the other.
Lists of deltas have wide applicability in many applications. One particular application in which such lists are used is compressed database indexes. A non-unique database index stores an efficient mapping between a key to a list of row identifiers (RIDs). In compressed indexes, these lists of RIDs are often first encoded using delta encoding, namely, each RID (except the first one) in the list is encoded as the difference from the previous RID.
These delta encoded lists of RIDs are then further compressed using a plurality of compression methods. One exemplary method is dictionary-based compression, where common bit patterns in the deltas are replaced with a short codeword.
An important property in compressed database indexes is that deletion of a RID in a RID list should not result in an expansion in the amount of space required to store the compressed RID list. A compression method and/or a compressed index that exhibits this property is said to be “delete-safe.” The delete-safe property is critical, because if the index occupies all the free space on disk, the user still expects a row deletion operation to succeed. Without the delete-safe property, the row deletion operation could fail, because the index after the delete requires more storage space on disk than before the delete operation.
However, one problem with choosing a dictionary-based method arbitrarily to compress a list of delta encoded integers is that the resultant index may not be delete-safe.