A digital image, such as a video image, a TV image, a still image or an image generated by a video recorder or a computer, consists of pixels arranged in horizontal and vertical lines. The number of pixels in a single image is typically in the tens of thousands. Each pixel typically contains luminance and chrominance information. Without compression, the quantity of information to be conveyed from an image encoder to an image decoder is so enormous that it renders real-time image transmission impossible. To reduce the amount of information to be transmitted, a number of different compression methods, such as JPEG, MPEG and H.263 standards, have been developed. In a typical video encoder, the frame of the original video sequence is partitioned into rectangular regions or blocks, which are encoded in Intra-mode (I-mode) or Inter-mode (P-mode). The blocks are coded independently using some kind of transform coding, such as DCT coding. However, pure block-based coding only reduces the inter-pixel correlation within a particular block, without considering the inter-block correlation of pixels, and it still produces high bit-rates for transmission. Current digital image coding standards also exploit certain methods that reduce the correlation of pixel values between blocks. to pixel B, etc. Similarly, if Mode 2 is selected, pixels a, b, c and d are predicted by setting them equal to pixel I, and pixels e, f, g and h are predicted by setting them equal to pixel J, etc. Thus, Mode 1 is a predictor in the vertical direction; and Mode 2 is a predictor in the horizontal direction. These modes are described in document VCEG-N54, published by ITU—Telecommunication Standardization Sector of Video Coding Expert Group (VCEG) in September 2001, and in document JVT-B118r2, published by the Joint Video Team of ISO/IEC MPEG and ITU-T VCEG in March, 2002.
Mode 0: DC Prediction                Generally all samples are predicted by (A+B+C+D+I+J+K+L+4)>>3. If four of the samples are outside the picture, the average of the remaining four is used for prediction. If all eight samples are outside the picture the prediction for all samples in the block is 128. A block may therefore always be predicted in this mode        
Mode 1: Vertical Prediction                If A, B, C, D are inside the picture, then        a, e. i, m are predicted by A,        b, f, j, n are predicted by B,        c, g. k, o are predicted by C,        d, h. 1, p are predicted by D.        
Mode 2: Horizontal Prediction                If E, F, G, H are inside the picture, then        a, b, c, d are predicted by E,        e, f, g, h are predicted by F,        i, j, k, l are predicted by G,        m, n, o, p are predicted by H.        
Mode 3: Diagonal Down/Right Prediction                This mode is used only if all A, B, C, D, I, J, K, L, Q are inside the picture. This is a “diagonal” prediction.        
In general, blocks encoded in P-mode are predicted from one of the previously coded and transmitted frames. The prediction information of a block is represented by a two-dimensional (2D) motion vector. For the blocks encoded in I-mode, the predicted block is formed using spatial prediction from already encoded neighboring blocks within the same frame. The prediction error, i.e., the difference between the block being encoded and the predicted block is represented as a set of weighted basis functions of some discrete transform. The transform is typically performed on an 8×8 or 4×4 block basis. The weights—transform coefficients—are subsequently quantized. Quantization introduces loss of information and, therefore, quantized coefficients have lower precision than the originals.
Quantized transform coefficients, together with motion vectors and some control information, form a complete coded sequence representation and are referred to as syntax elements. Prior to transmission from the encoder to the decoder, all syntax elements are entropy coded so as to further reduce the number of bits needed for their representation.
In the decoder, the block in the current frame is obtained by first constructing its prediction in the same manner as in the encoder and by adding to the prediction the compressed prediction error. The compressed prediction error is found by weighting the transform basis functions using the quantized coefficients. The difference between the reconstructed frame and the original frame is called reconstruction error.
The compression ratio, i.e., the ratio of the number of bits used to represent the original and compressed sequences, both in case of I- and P-blocks, is controlled by adjusting the value of the quantization parameter that is used to quantize transform coefficients. The compression ratio also depends on the employed method of entropy coding.
An example of spatial prediction used in a Working Draft Number 2 (WD2) of the JVT coder is described as follows. In order to perform the spatial prediction, the JVT coder offers 9 modes for prediction of 4×4 blocks, including DC prediction (Mode 0) and 8 directional modes, labeled 1 through 7, as shown in FIG. 1. The prediction process is illustrated in FIG. 2. As shown in FIG. 2, the pixels from a to p are to be encoded, and pixels A to Q from neighboring blocks that have already been encoded are used for prediction. If, for example, Mode 1 is selected, then pixels a, e, i and m are predicted by setting them equal to pixel A, and pixels b, f, j and n are predicted by setting them equal
m is predicted by(J + 2K + L + 2)>>2i, n are predicted by(I + 2J + K + 2)>>2e, j, o are predicted by(Q + 2I + J + 2)>>2a, f, k, p are predicted by(A + 2Q + I + 2)>>2b, g, l are predicted by(Q + 2A + B + 2)>>2c, h are predicted by(A + 2B + C + 2)>>2d is predicted by(B + 2C + D + 2)>>2
Mode 4: Diagonal Down/Left Prediction                This mode is used only if all A, B, C, D, I, J, K, L, Q are inside the picture. This is a “diagonal” prediction.        
a is predicted by(A + 2B + C + I + 2J + K + 4)>>3b, e are predicted by(B + 2C + D + J + 2K + L + 4)>>3c, f, i are predicted by(C + 2D + E + K + 2L + M + 4)>>3d, g, j, m are predicted by(D + 2E + F + L + 2M + N + 4)>>3h, k, n are predicted(E + 2F + G + M + 2N + O + 4)>>3l, o are predicted by(F + 2G + H + N + 2O + P + 4)>>3p is predicted by(G + H + O + P + 2)>>3
Mode 5: Vertical-Left Prediction                This mode is used only if all A, B, C, D, I, J, K, L, Q are inside the picture. This is a “diagonal” prediction.        
a, j are predicted by(Q + A + 1)>>1b, k are predicted by(A + B + 1)>>1c, l are predicted by(B + C + 1)>>1d is predicted by(C + D + 1)>>1e, n are predicted by(I + 2Q + A + 2)>>2f, o are predicted by(Q + 2A + B + 2)>>2g, p are predicted by(A + 2B + C + 2)>>2h is predicted by(B + 2C + D + 2)>>2i is predicted by(Q + 2I + J + 2)>>2m is predicted by(I + 2J + K + 2)>>2
Mode 6: Vertical-Right Prediction                This mode is used only if all A, B, C, D, I, J, K, L, Q are inside the picture. This is a “diagonal” prediction.        
a is predicted by(2A + 2B + J + 2K + L + 4)>>3b, i are predicted by(B + C + 1)>>1c, j are predicted by(C + D + 1)>>1d, k are predicted by(D + E + 1)>>1l is predicted by(E + F + 1)>>1e is predicted by(A + 2B + C + K + 2L + M + 4)>>3f, m are predicted by(B + 2C + D + 2)>>2g, n are predicted by(C + 2D + E + 2)>>2h, o are predicted by(D + 2E + F + 2)>>2p is predicted by(E + 2F + G + 2)>>2
Mode 7: Horizontal-Up Prediction                This mode is used only if all A, B, C, D, I, J, K, L, Q are inside the picture. This is a “diagonal” prediction.        
a is predicted by(B + 2C + D + 2I + 2J + 4)>>3b is predicted by(C + 2D + E + I + 2J + K + 4)>>3c, e are predicted by(D + 2E F + 2J + 2K + 4)>>3d, f are predicted by(E + 2F + G + J + 2K + L + 4)>>3g, i are predicted by(F + 2G + H + 2K + 2L + 4)>>3h, j are predicted by(G + 3H + K + 3L + 4)>>3l, n are predicted by(L + 2M + N + 2)>>3k, m are predicted by(G + H + L + M + 2)>>2o is predicted by(M + N + 1)>>1p is predicted by(M + 2N + O + 2)>>2
Mode 8: Horizontal-Down Prediction                This mode is used only if all A, B, C, D, I, J, K, L, Q are inside the picture. This is a “diagonal” prediction.        
a, g are predicted by(Q + I + 1)>>1b, h are predicted by(I + 2Q + A + 2)>>2c is predicted by(Q + 2A + B + 2)>>2d is predicted by(A + 2B + C + 2)>>2e, k are predicted by(I + J + 1)>>1f, l are predicted by(Q + 2I + J + 2)>>2i, o are predicted by(J + K + 1)>>1j, p are predicted by(I + 2J + K + 2)>>2m is predicted by(K + L 1)>>1n is predicted by(J + 2K + L + 2)>>2
Since each block must have a prediction mode assigned and transmitted to the decoder, this would require a considerable number of bits if coded directly. In order to reduce the amount of information to be transmitted, the correlation of the prediction modes of adjacent blocks can be used. For example, Vahteri et al. (WO 01/54416 A1, “A Method for Encoding Images and An Image Coder”, hereafter referred to as Vahteri) discloses a block-based coding method wherein directionality information of the image within the blocks are used to classify a plurality of spatial prediction modes. The spatial prediction mode of a block is determined by the directionality information of at least one neighboring block.
In JVT coder, when the prediction modes of neighboring, already-coded blocks U and L are known, an ordering of the most probable prediction mode, the next most probable prediction mode, etc., for block C is given (FIG. 3). The ordering of modes is specified for each combination of prediction modes of U and L. This order can be specified as a list of prediction modes for block C ordered from the most to the least probable one. The ordered list used in the WD2 of the JVT coder, as disclosed in VCEG-N54, is given below:
TABLE 1Prediction mode as a function of ordering signalled in the bitstreamL/Uoutside0123outside ––––––––0––––––––01–––––––10––––––––––––––––002–––––––0216485731256304870218765430213586471–––––––––102654387162530487120657483102536487220–––––––2801743652176835042871064352810357643–––––––––2013854761253684702081375463258146704–––––––––2014678351620458732041786354206158375–––––––––0152638471526384072015846735312864076–––––––––0162475831602457382061478531602458377–––––––––2701486352176085432781054632701548638–––––––––280173456127834560287104365283510764L/U45678outside ––––––––––––––––––––––––––––––––––––––––––––020614758351236804716205437820476185320813465711620453781563204871654230786120475831206857342287640153215368740216748530278016435287103654342106835753126847021658430724083176583251047644260157831624580376412057834270618532048517635125063478513620847165230487210856743210853647664012753816520437861402753826417058321608457372746018532716508342746150832740861532784061538287461350251368407216847350287410365283074165
Here, an example of the prediction modes for the block C, as specified in the WD2 of the JVT coder, is given when the prediction mode for both U and L is 2. The string (2, 8, 7, 1, 0, 6, 4, 3, 5) indicates that mode 2 is also the most probable mode for block C. Mode 8 is the next most probable mode, etc. To the decoder the information will be transmitted indicating that the nth most probable mode will be used for block C. The ordering of the modes for block C can also be specified by listing the rank for each mode: the higher the rank, the less probable the prediction method. For the above example, the rank list would be (5, 4, 1, 8, 7, 9, 6, 3, 2). When the modes (0, 1, 2, 3, 4, 5, 6, 7, 8) are related to the rank list (5, 4, 1, 8, 7, 9, 6, 3, 2), we can tell that Mode 0 has a rank 5, Mode 1 has a rank 4, etc.
For more efficient coding, information on intra prediction of two 4×4 blocks can be coded in one codeword.
The above-mentioned method has one major drawback—the memory required to keep ordering of prediction modes for block C given prediction modes of blocks U and L is demanding. In WD2 of the JVT coder, because 9 modes are used for prediction, there are 9×9 possible combinations of modes for blocks U and L. For each combination, an ordering of 9 possible modes has to be specified. That means that 9×9×9 bytes (here it is assumed that one number requires one byte) are needed to specify the ordering of prediction modes. In addition, more memory may be required to specify the special cases—for example, if one or both blocks U and L are not available.
Thus, it is advantageous and desirable to provide a method and device for coding a digital image wherein the memory requirements are reduced while the loss in coding efficiency is minimal.