The invention relates to compression techniques that operates on sets of integers so that they may be stored or transmitted efficiently.
Two known techniques for compressing sets of integers are the "array"and "interval" methods. The array method represents the set of integers as an array or list with one element for each integer. In reality, the array method is not really a compression technique at all since the size of the output is identical to that of the input data set. The interval method views the input data set as a set of intervals or ranges; even isolated integers ("singletons") are treated as intervals in which the low bound equals the high bound. The interval method encodes the set as a sequence of intervals, storing the low and high bounds for each interval.
The interval method offers dramatic efficiency gains over the array method when the input data set actually consists of large intervals. However, the interval method breaks down when there are gaps, however small, in the sequence of integers. In fact, the interval method is actually less efficient than the array method when more than half of the set consists of singletons, due to the overhead of representing single points as intervals.
Both the array and interval methods become less efficient as the integer byte representation gets larger. Because they place into the compressed data stream integers of the same size as the integers in the input data set, the number of bytes of the compressed data stream grows linearly as a function both of the number of bytes in the integer representation and of the number of integers or intervals. Even where the integers do not always make use of their full size, these two methods are unable to capitalize on that circumstance.