Some data processing applications (e.g., search engines) work extensively with variable-length data (e.g., variable-length integers). To conserve space and/or increase throughput, variable-length data can be encoded into a compressed format which represents the data in fewer bytes than would ordinarily be used to store the data. For example, an integer value may be associated with a 32-bit integer data type. However, if the actual value of the integer is in the range of 0 to 255, the value can be more compactly represented by 8-bits or a single byte, resulting in a savings of 24 bits or three bytes.
The encoding of variable-length data into a compressed format typically requires the storing and maintaining of additional information for use in decoding, such as data indicating the number of bits or bytes used to represent a compressed integer value. The management of such “bookkeeping” information typically requires additional overhead bits for use during decoding. For example, some conventional encoders add a “continuation bit” to each byte used to represent a compressed integer value to assist the decoder in identifying boundaries between consecutive compressed integer values. Although effective, such encoding techniques typically require several bitwise operations (e.g., Boolean, shift, branch, etc.) to unpack or decompress the integer values, which can slow down the decoding process and degrade overall system performance. Such degradation is especially problematic in applications that perform large-scale processing of compressed data, such as information retrieval systems.
Accordingly, what is needed is a system and method for efficiently encoding and decoding variable-length data.