Integer compression is essential in numerous systems including communication, multimedia, information retrieval systems, and computer networks as well as VLSI intra-connect and interconnect networks. In numerous cases where digital data is stored or transmitted, integer compression can be used to reduce the bandwidth consumed and/or the memory required to manage data. Major applications include compression of network, image, video, audio, speech, and fax data in products as varied as file compression, cell phones, online backup, and storage media. Additionally, Integer compression is useful for efficient retrieval of information from the Internet.
In 1951, Huffman developed a uniquely decodable (UD) method for lossless compression of information containing finite length symbols from a finite alphabet with a known probability distribution. Using coding techniques such as those developed by Elias, Zeckendorf (Fibonacci coding), and Golomb, lossless compression was subsequently extended to unbounded integers without a finite bit length. These innovations prompted extensive algorithmic exploration in the 1970s and early 1980s, resulting in the development of a series of ground-breaking compression algorithms and systems (LZ77, LZ78, LZW, etc.).
Most compression techniques have two variations: a static and a dynamic approach. A static compression technique requires that both the encoder and the decoder share a predictive model. When the data to be sent matches the output from the model, the encoder can usually transmit the data at a lower information cost by transmitting the output from the model. Static methods for string compression include: Huffman codes, Tunstall codes, and static dictionaries. Static methods for unbounded integer compression include the Elias family of codes, Fibonacci coding, Golomb Coding, and other methods.
Dynamic compression does not need a shared predictive model, but instead requires the encoder and decoder to have a shared meta-model (method) defining how each will alter their models in response to the actual data. Consequently, using dynamic compression, no initial model need be shared. Well known and extensively used dynamic lossless compression algorithms include: dynamic Huffman coding, dynamic Tunstall coding, dynamic Arithmetic coding, and the dynamic dictionary methods derived via the application and extension of the Lempel and Ziv (LZ) algorithms known as LZ77 and LZ78. Many of these algorithms are used in communication and information processing, as well as in multimedia systems and devices. A dynamic encoding and decoding process might involve items that have not been encountered yet by the encoder (and the decoder) and items that have been encountered.
At each given point of the dynamic process, the encoder and the decoder might maintain a list of all the items that have been encountered so far which is often referred to as the already transmitted (AT) list. When a new item arrives, the encoder (decoder) may check if the item is already in the AT list by searching the AT list. In similarity to caching systems, we may refer to the event where the search for a new item is successful and the item is in AT as a “hit.” We refer to the case where the new item is not in AT as a miss. Different methods may distinguish between these two events (hit and miss) via a flag bit or a flag string. Alternatively, the distinction may be denoted by an exception code. Additionally, several of the encoding and decoding processes may need to estimate the probability of occurrence of specific items. This may be done by maintaining a counter. For example, the counter N(a) may be used to count the number of occurrences of the integer a in a given time interval. In this case, N(a) may be used to estimate the probability of occurrence of a. The plurality of counters, each of which, is dedicated to count occurrences of specific integers may be used to estimate the probability distribution function of an items that belong to a specific information source. Nevertheless, other methods for estimating the PDF may be used. In some cases, information about the PDF is available prior to the encoding and can be exploited in the encoding process.
Despite the relative success of these known methods, there is still a need for improved methods of lossless integer compression that can be used to extend and improve several existing dynamic lossless data compression systems.
In most applications, the input integers are bounded. Nevertheless, for the case where there is no prior knowledge concerning the magnitude of input integers, one may assume that they are represented using some kind of a comma system which enables identifying the boundaries of input integers. Other uniquely decodable representations may be considered. Hence, a stream of unbounded integers may be represented using a uniquely decodable variable length code. In some cases, the Elias Delta Code of the integers may be utilized as a uniquely decodable (UD) infinite alphabet on which the methods operate. Alternatively, the Elias Delta code may be generated as a part of the integer encoding and decoding process.