Digital data, such as digital audio signals, digital images or digital video, are often encoded to enable efficient storage or transmission. Two fundamentally different approaches in digital data coding are lossless coding and lossy coding. Lossless coding allows for the exact reconstruction of the digital data by the decoder. In contrast, lossy coding introduces irrecoverable errors in the decoded digital data, at the same time enabling more efficient compression. Similar to lossless coding, lossy coding includes a lossless compression, but only for the relevant information in the digital data set, whereas the irrelevant information is discarded. The lossless encoding method, or compression method, defined by the invention disclosed herein can be used for both lossless and lossy digital data coders.
An important application of digital data coding is audio coding of real-time voice communications over packet networks. Here, typically lossy coding is preferred since it results in lower bit rates than lossless coding. In this field of application, the codec is typically optimized for speech signals, combining a good speech quality with a high coding efficiency. For pleasant conversation using such a codec, it is important that the latency in the communications link is kept to a minimum, which requires that the coding and packetization introduce very little delay. The latter can only be achieved by sending out packets at short intervals, such as once every 10 or 20 milliseconds. Another important property of a codec for voice of packet networks is robustness against packet losses, because for many types of networks complete packets may get lost or become severely delayed. This may be provided for by minimizing the dependency in the decoder on previously decoded packets. Robustness against bit errors within a packet, on the other hand, is typically not required, as most packet networks provide error detection and correction. Computational complexity also needs to be kept to a minimum, depending on the hardware that runs the audio codec.
An example of a lossy audio coder is described in “Perceptual Audio Coding Using Adaptive Pre- and Post-Filters and Lossless Compression”, IEEE Transaction on Speech and Audio Processing, Vol. 10, No 6, September 2002, by G. D. T. Schuller et al. This audio coder incorporates a lossless compression method to encode the information that is considered relevant. This relevant information is obtained by pre-filtering the audio signal and then quantizing the result. The lossless encoding of the quantization indices is done with the aid of a backward adaptive prediction filter which makes a prediction of the value of each quantization index, based on previously encoded quantization indices. Because the difference between the actual and predicted indices has a smaller spread than the quantization indices themselves, the indices can be more efficiently encoded. Such a backward adaptive prediction filter is not very suitable however for use with short packets of just a few tens of milliseconds. The reason is that when packets are lost, the prediction filter will not be in the correct state for the next packet and the lossless decoder will consequently give erroneous results. This could be resolved by resetting the prediction filter for each new packet, but that would severely reduce coding efficiency. To overcome this problem, the method described in the current invention uses lossless encoding and decoding based on forward adaptive modeling, where each packet is encoded independently of previously encoded packets.
An example of a lossless audio coder is described in “Lossless Transform Coding of Audio Signals”, proceedings of the 102nd AES Convention, Munich, 1997, by T. Liebchen, M. Purat and P. Noll. This coder uses a Discrete Cosine Transform to convert a block of time samples into a block of frequency coefficients. These frequency coefficients are quantized and the quantization indices are then losslessly encoded. For this purpose, the frequency coefficients are grouped per 32 adjacent coefficients, and it is observed that the coefficients in each group have an almost Laplacian distribution, and can thus be efficiently encoded using Rice coding. For each group a certain Rice code is chosen that matches the distribution within the group best. There are several shortcomings to this scheme however. First of all, Rice codes only exist for discrete values of the standard deviation (spread) of the Laplacian distribution. Second, the method assumes that the statistics are constant over the group of 32 coefficients, and changes abruptly at the boundary between two groups, whereas the real standard deviation will fluctuate from coefficient to coefficient. Finally, the Rice codes work well for Laplacian distributed coefficients but in reality the frequency coefficients are not exactly Laplacian distributed. For all of these reasons, there can be substantial mismatch between the Rice code and the distribution of the coefficients, resulting in a higher bitrate. The method described in accordance with the present invention overcomes each of these limitations.
Arithmetic coding is an efficient scheme for lossless coding, or lossless compression, of a sequence of symbols. The code length will for an arbitrarily sized block of data lie within a few bits of the self-information of the data to be encoded. The use of probability models for the source code alphabet, rather than using pre-stored tables of code words, provide a higher computational burden, but require less memory space as there is no need to store tables with code words. The theory of arithmetic coding is well known to the skilled person. In general, the input data of an encoder is assumed to consist of a sequence of N source symbols s1, s2, . . . , sN. Each symbol si comes from an alphabet of K letters {a1, a2, . . . , aK}, and the probability Pi(aj) of each letter aj is known to the encoder as well as to the decoder. For each symbol si, the letter probabilities {Pi(aj)} add up to unity. Therefore, the probabilities {P1(aj)} pertaining to the first symbol si define a division of a line segment of length one into intervals of width P1(aj). The order of the intervals is taken to be the same as the order of the letters in the alphabet. For the second symbol s2, these intervals is again divided in subintervals of width P1(aj)P2(ak), and so on for the remaining symbols. The result is a division of the unit interval into adjacent, non-overlapping intervals, with one interval for each possible input sequence. The width of an interval is equal to the likelihood of the corresponding sequence. This method of generating intervals is the essence of arithmetic coding. In practice, rather than finding all possible intervals, it suffices to compute only the interval corresponding to the actual input data.
Thus, arithmetic coding relies on the probabilities of the input symbols. However, in practice, these probabilities are rarely known, instead a model is used that provides an approximation of the true probabilities. Thus, when using the term probability for arithmetic coding, it is in fact referred to some hypothetical probability.
When using arithmetic coding, the efficiency of the code is governed by how the symbol probabilities used in the design of the code are related to the actual probabilities of the symbols that are being coded. If there is a mismatch, the code will on average produce longer code words than necessary and the code will be less efficient in its compression. Hence, to obtain an efficient code, it is crucial to have a description of the data statistics, i.e. the symbol probability, which is as accurate as possible. Traditionally this means that a lot of data is collected and a probability density function (PDF) is determined to fit all data in the set. A problem is however that many real-life data sources, such as audio or images, have characteristics that significantly change over the span of a block of collected data.