In recent years, there has been an increased interest in digital audio and video communications, such as video telephony, video go conferences and other applications, e.g. digital video editing, digital broadcasting etc. The typical data is a stream of digitised samples of sound or light intensity represented as fixed-width numbers. For audio with CD quality, the samples are 16 bit numbers representing the dynamic range [0, 2.sup.16 -1]=[0,65535]. One problem with digital audio/video data is that the amount of data is enormous and requires a high bandwidth for communication and much disk space for storage. Although there has been a tremendous development in transmission/storage technologies and bandwidth is getting cheaper, digital applications still need to code the information in an efficient way using data compression. In particular, for today's and future applications in wireless communication and multimedia, data compression is very essential since the total radio-frequency bandwidth is limited, and since time constraints exist in many applications.
There are basically two different strategies that can be employed in order to compress data.
The first strategy is to reduce the statistical redundancy. This is performed by using a statistical model of the correlation between the symbols, which, for example, predicts the probability of a symbol given the preceding symbols. By doing this, one can transform a stream of more-or-less equally probable symbols, into a new stream of symbols with less correlation and a peaked probability distribution. The entropy of the source is thus greatly reduced. Entropy coding can then be used to code probable symbols using short codes, while less probable events are represented by long codes, resulting in a shorter total length of the digital stream. Such a scheme is called lossless, because no distortion is introduced in the data by the compression, and the original data sequence can hence be perfectly reconstructed.
The second strategy that is used to achieve additional compression removes some of the information and is known as lossy data compression. The aim of such a procedure is to make the distortion introduced by the lossy scheme as perceptually small as possible. Lossy data compression is used in most of the efficient audio, image, and video compression algorithms.
One common technique to code data is called Bit-Plane Coding (BPC). Given a set of digital numbers in the dynamic range R=[0, 2.sup.k -1] (for example, R=[0, 255] corresponding to k=8-bit digital numbers), the numbers are reparded as k sets/planes of binary digits (bits) {0,1}. Each bit-plane corresponds to one bit in the binary representation of all the numbers. Using this standard bit-plane splitting, there remains a strong correlation between the bit-planes. This correlation can be exploited in a predictive coding scheme, for example, by coding the less significant bit planes with respect to the more significant planes. See for example, M. Rabbani, P. Melnychuck, "Conditioning Context for the Arith. Coding of Bit-Planes", IEEE trans. on Signal Proc., Vol. 40 No. 1, January 1992 and U.S. Pat. No. 5,442,448.
Another approach is to decorrelate the bit planes by applying a Gray coding of all the data values before bit-plane splitting is done. After that, the bit-planes can be coded independently with approximately the same result as for the predictive coding, but with a lower complexity.
The motivation for such a decomposition is that each bit-plane can be encoded efficiently using binary compression techniques.
Working in a binary domain reduces the complexity of the in-plane prediction model, and higher-order prediction can be used. For example, in the case of a video coding, a larger, spatial area around the element to be coded can be considered for the prediction.
Considering the bit-plane representation of ordinary natural image sequences, it is generally the case that less variation and more correlation is present between the samples in the most significant bit-planes. Hence, they can be coded more efficiently than the less significant bit-planes. From this point of view, a significant increase in the compression ratio can be achieved by doing lossy coding by neglecting the least significant bit-planes.
The compression of the most significant bit-planes, together with skipping of the least significant planes, has been used in the field of visual data compression, for example as described in the co-pending International patent application PCT/SE96/00943, which discloses an algorithm which is based on an extended BPC approach to video sequences.
If all symbols are equally probable, no (lossless) compression can be done. However, as discussed above, if one can find a statistical model to predict the data from previous values, one can construct a new sequence of symbols having a probability distribution with less variance, which can then be compressed. This corresponds to a reduction of entropy if the symbols are treated as being independent.
Consider a source of symbols, S={s(1), s(2),s(3), . . . }, represented as integers in the dynamic range R (s(i) is the symbol produced at the instant i). Each number has k bits, and the dynamic range is R=[0, 2.sup.k -1]. Thus, if k=3, R=[0,7].
If these symbols are a set of either spatially or temporally successive samples in an audio or video sequence, they are correlated in the sense that neighbouring values are often close to each other.
A general method for exploiting this correlation is to build a new source of symbols O={o(1),o(2),o(3), . . . } from the original one, whose probability distribution is much more peaked around a few values, and therefore allows a higher compression.
A known solution in the literature, is to make O the difference between successive values: EQU o(1)=s(1) EQU o(i)=s(i)-s(i-1)
from which the original sequence can be perfectly reconstructed as s(1)=o(1), EQU s(i)=o(i)+s(i-1)
The method is known as Differential Pulse Code Modulation (DPCM).
One way to look at s(i-1) is as a prediction of the value of s(i), and to look at o(i) as the prediction error. This is an example of exploiting first-order correlation among the data. For a stationary source, the general way of constructing O may be written as: EQU o(i)=s(i)-s'(i)
where s'(i) denotes the prediction which in the general case is given as: EQU s'(i)=f(s(i-1), s(i-2) . . . )
where f is the function that implements the new representation.
In the following s'(i) denotes the prediction.
In the simple case of a first order correlation, a two-dimensional graphical representation of the function f can be introduced. An example with a 3-bit dynamic-range [0, 7] showing the DPCM values for each combination of s'(i)=s(i-1) and s(i) is shown in the table of FIG. 1.
Note that each (sub-)diagonal represents a specific Euclidean distance between the values s(i) and the prediction s'(i).
The histogram for o(i) for typical sources, for example audio signals, is peaked around zero due to the similarity of successive samples, and the lower variance allows an efficient coding of o(i). This implies that s'(i)=s(i-1) is a good prediction.
However, the subtraction operation used in DPCM does not preserve the dynamic range. For example, if the source symbols are characterised by 8-bit dynamic range [0,255], the output will have a doubled dynamic-range [-255, 255] corresponding to 9-bit numbers. This leads to a reduction in the compression, when, for example, a bit-plane coding approach is considered (one more bit plane is to be coded). Secondly, an increase in the complexity is introduced from the architectural point of view. One example is that architectures of ordinary digital systems are based on bytes as smallest data unit (1 byte=8 bits); and the increase in the dynamic range can then require an extension to use 2 bytes for each sample.
Furthermore, in Ying Wang, "A Set of Transformations for Lossless Image Compression", IEEE Transactions on image processing, Vol. 4, No. 5, May 1995, pp. 677-699, a transformation which preserves the dynamic range of the DPCM for a still image coding scheme is described. This paper also addresses the possibility of permuting the entry symbols in table 2a in order to code the symbols more efficiently. Examples of the corresponding tables are shown in FIGS. 3a and 4a, respectively. The corresponding number of "ones" required for coding each difference symbols are shown in FIGS. 2b, 3b and 4b, respectively. The number of ones in this case means the number of binary ones used for coding a specific symbol of the table entries. For example, if the symbol to be coded is "4", the binary representation for this symbol is "100", which contains one binary one. A reduction of the number of ones will provide a more efficient bit-plane coding.
Thus, the table in FIG. 1 shows the different coded values for different combinations of preceding and current symbol, for a data sequence having 3-bit dynamic-range and coded according to the common DPCM algorithm. If the value 8 is added to all the negative entries in the table of FIG. 1, the table shown in FIG. 2a is obtained. The table of FIG. 2a is in many cases to be preferred to the table of FIG. 1, since it provides the same information without causing any increase in the dynamic range. Thus, in the graphical representation this means that only numbers in the original dynamic-range occur in the table 2a.
Furthermore, the tables 3a and 4a are modifications of FIG. 2a also showing invertible, permutative codes.
Invertible in this case means that the table is a group multiplication table and that the sequence can be decoded in reverse order, i.e. the sequence is reversible. Permutative means that the structure of the codes is the same as the structure shown in FIG. 2a, but the entry symbols have been permuted.