1. Field of the Invention
The present invention relates to an image data encoding technique.
2. Description of the Related Art
When compressing a moving image, the image data is generally divided into brightness components and color difference components, and the respective components are encoded. International standard techniques also divide a moving image into brightness components and hue components, and encode the respective components.
First, a given picture (frame) is solely encoded without referring to other pictures. By referring to the picture, pictures to be subsequently input over a course of time are predictively encoded using motion prediction and compensation. Encoding without referring to other pictures is called intra-coding. Encoding using motion prediction and compensation by referring to other pictures is called inter-coding.
In intra-coding or inter-coding, any image data is lossy-compressed (lossy-encoded) by performing DCT (Discrete Cosine Transformation), quantization, and entropy-coding. In intra-coding, predictive coding (intra-frame predictive coding) is used in a picture without referring to other pictures.
There is known a moving image compression technique by MPEG (Motion Picture Experts Group)-4 recommended by ISO/IEC (International Organization for Standardization/International Electrotechnical Commission) and H.263+ of ITU-T (International Telecommunication Union-Telecommunication Standardization). According to this technique, in intra-frame prediction for intra-coding, blocks each having eight vertical pixels and eight horizontal pixels (to be expressed as 8×8 pixels hereinafter) are set. Each block is transformed by DCT, and coefficients obtained by the DCT are quantized. DC (Direct Current) and AC (Alternation Current) values after quantization are predicted by predictively encoding the DC and AC values of the neighboring blocks, thereby raising the compression efficiency.
The ISO/IEC MPEG and ITU-T VCEG (Video Coding Experts Group) cooperatively organized a JVT (Joint Video Team) and recommended a new video encoding standard March 2003. This standard is named MPEG-4 Part10/AVC (Advanced Video Coding) in the ISO/IEC recommendation. In the ITU-T, it is known as H.264. This recommendation includes a compression technique using intra-frame predictive coding for intra-coding. This technique uses an intra-frame predictive coding method and blocks whose sizes change between brightness components and color difference components. The brightness component is predicted in each 4×4 pixel block or 16×16 pixel block. Nine prediction methods are defined depending on the prediction direction for a 4×4 pixel block, and four prediction methods are defined depending on the prediction direction for a 16×16 pixel block (e.g., “Fast Mode Decision for Intra Prediction”, JVT-G013, 7-14 Mar. 2003). The prediction directions in each block are associated with numbers called prediction modes in a one-to-one correspondence.
FIG. 3 shows the relationship between prediction modes and prediction directions when encoding a 4×4 pixel block. The prediction modes of the 4×4 pixel block are represented by numbers 0 to 8. Prediction direction number “2” represents mean value prediction.
FIG. 4 shows a 4×4 pixel block. One cell represents one pixel. A, B, C, . . . in upper case denote reference pixel values, and a, b, c, . . . in lower case denote encoding target pixels in a block of interest. Letting P(A), P(B), P(C), . . . be the brightness values of the reference pixel values A, B, C, . . . , and Pred(a), Pred(b), Pred(c), . . . be the predicted values of the encoding target pixels a, b, c, . . . , the prediction modes and predicted values of the 4×4 pixel block are represented in the following way.
Note that Pred(ALL) represents the predicted values of all pixels (4×4=16 pixels) in the block, and “>>i” indicates that an i-bit shift operation to the right side (i-bit shift to the lower bit side) should be performed. Hence, N>>1 represents that a value N is to be shifted to the right side by one bit. This is equivalent to calculating N/21 (fractions below the decimal point are dropped). N>>2 is equivalent to calculating N/22=N/4 (fractions below the decimal point are dropped). Hence, for example, “(N+1)>>1” is equivalent to calculating N/2 with its fractions below the decimal point being rounded up.
The scan order of 4×4 pixel blocks is a raster scan order starting from the pixel block at the upper left corner of an image. The coordinates of a pixel or a pixel block in an image are defined using x coordinates in the horizontal rightward direction and y coordinates in the vertical downward direction based on the origin (0,0) set at the upper left corner, as is well known. Hence, the first pixel block of a picture indicates the pixel block located at the upper left corner of the image.
<Prediction Mode “0” of 4×4 Pixel Block>Pred(a)=Pred(e)=Pred(i)=Pred(m)=P(A)Pred(b)=Pred(f)=Pred(j)=Pred(n)=P(B)Pred(c)=Pred(g)=Pred(k)=Pred(o)=P(C)Pred(d)=Pred(h)=Pred(l)=Pred(p)=P(D)
<Prediction Mode “1” of 4×4 Pixel Block>Pred(a)=Pred(b)=Pred(c)=Pred(d)=P(I)Pred(e)=Pred(f)=Pred(g)=Pred(h)=P(J)Pred(i)=Pred(j)=Pred(k)=Pred(l)=P(K)Pred(m)=Pred(n)=Pred(o)=Pred(p)=P(L)
<Prediction Mode “2” of 4×4 Pixel Block>
[When the 4×4 pixel block is located at the start of the picture]Pred(ALL)=128
[When the 4×4 pixel block is located on the upper edge of the picture]Pred(ALL)={P(I)+P(J)+P(K)+P(L)+2}>>2
[When the 4×4 pixel block is located on the left edge of the picture]Pred(ALL)={P(A)+P(B)+P(C)+P(D)+2}>>2
[Other Cases]Pred(ALL)={P(A)+P(B)+P(C)+P(D)+P(I)+P(J)+P(K)+P(L)+4}>>3
<Prediction Mode “3” of 4×4 Pixel Block>Pred(a)={P(A)+2P(B)+P(C)+2}>>2Pred(b)=Pred(e)={P(B)+2P(C)+P(D)+2}>>2Pred(c)=Pred(f)=Pred(i)={P(C)+2P(D)+P(E)+2}>>2Pred(d)=Pred(g)=Pred(j)=Pred(m)={P(D)+2P(E)+P(F)+2}>>2Pred(h)=Pred(k)=Pred(n)={P(E)+2P(F)+P(G)+2}>>2Pred(l)=Pred(o)={P(F)+2P(G)+P(H)+2}>>2Pred(p)={3P(G)+P(H)+2}>>2
<Prediction Mode “4” of 4×4 Pixel Block>Pred(a)=Pred(f)=Pred(k)=Pred(p)={P(A)+2P(M)+P(I)+2}>>2Pred(b)=Pred(g)=Pred(i)={P(M)+2P(A)+P(B)+2}>>2Pred(c)=Pred(h)={P(A)+2P(B)+P(C)+2}>>2Pred(d)={P(B)+2P(C)+P(D)+2}>>2Pred(e)=Pred(j)=Pred(o)={P(M)+2P(I)+P(J)+2}>>2Pred(i)=Pred(n)={P(I)+2P(J)+P(K)+2}>>2Pred(m)={P(J)+2P(K)+P(L)+2}>>2
<Prediction Mode “5” of 4×4 Pixel Block>Pred(a)=Pred(j)={P(M)+P(A)+1}>>1Pred(b)=Pred(k)={P(A)+P(B)+1}>>1Pred(c)=Pred(l)={P(B)+P(C)+1}>>1Pred(d)={P(C)+P(D)+1}>>1Pred(f)=Pred(o)={P(M)+2P(A)+P(B)+2}>>2Pred(g)=Pred(p)={P(A)+2P(B)+P(C)+2}>>2Pred(h)={P(B)+2P(C)+P(D)+2}>>2Pred(e)=Pred(n)={P(I)+2P(M)+P(A)+2}>>2Pred(i)={P(J)+2P(I)+P(M)+2}>>2Pred(m)={P(K)+2P(J)+P(I)+2}>>2
<Prediction Mode “6” of 4×4 Pixel Block>Pred(a)=Pred(g)={P(M)+P(I)+1}>>1Pred(e)=Pred(k)={P(I)+P(J)+1}>>1Pred(i)=Pred(o)={P(J)+P(K)+1}>>1Pred(m)={P(K)+P(L)+1}>>1Pred(f)=Pred(l)={P(M)+2P(I)+P(J)+2}>>2Pred(j)=Pred(p)={P(I)+2P(J)+P(K)+2}>>2Pred(n)={P(J)+2P(K)+P(L)+2}>>2Pred(b)=Pred(h)={P(A)+2P(M)+P(I)+2}>>2Pred(c)={P(B)+2P(A)+P(M)+2}>>2Pred(d)={P(C)+2P(B)+P(A)+2}>>2
<Prediction Mode “7” of 4×4 Pixel Block>Pred(a)={P(A)+P(B)+1}>>1Pred(b)=Pred(i)={P(B)+P(C)+1}>>1Pred(c)=Pred(j)={P(C)+P(D)+1}>>1Pred(d)=Pred(k)={P(D)+P(E)+1}>>1Pred(l)={P(E)+P(F)+1}>>1Pred(e)={P(A)+2P(B)+P(C)+2}>>2Pred(f)=Pred(m)={P(B)+2P(C)+P(D)+2}>>2Pred(g)=Pred(n)={P(C)+2P(D)+P(E)+2}>>2Pred(h)=Pred(o)={P(D)+2P(E)+P(F)+2}>>2Pred(p)={P(E)+2P(F)+P(G)+2}>>2
<Prediction Mode “8” of 4×4 Pixel Block>Pred(a)={P(I)+P(J)+1}>>1Pred(e)=Pred(c)={P(J)+P(K)+1}>>1Pred(i)=Pred(g)={P(K)+P(L)+1}>>1Pred(b)={P(I)+2P(J)+P(K)+2}>>2Pred(f)=Pred(d)={P(J)+2P(K)+P(L)+2}>>2Pred(j)=Pred(h)={P(J)+3P(K)+2}>>2Pred(k)=Pred(l)=Pred(m)=Pred(n)=Pred(o)=Pred(p)=P(L)
The relationship between prediction modes and prediction directions of a 16×16 pixel block will be described next. In a 16×16 pixel block, prediction mode “0” is vertical prediction, prediction mode “1” is horizontal prediction, prediction mode “2” is mean value prediction, and prediction mode “3” is planar prediction. FIG. 5 shows a 16×16 pixel block. Let P(x,y) be the pixel value of a pixel (x,y), and Pred(x,y) be the predicted value of the pixel (x,y). Predicted values in each prediction direction are represented in the following way.
<Prediction Mode “0” of 16×16 Pixel Block>Pred(x,y)=P(x,−1)(x=0 to 15,y=0 to 15)
<Prediction Mode “1” of 16×16 Pixel Block>Pred(x,y)=P(−1,y)(x=0 to 15,y=0 to 15)
<Prediction Mode “2” of 16×16 Pixel Block>
[At the Start of the Picture]Pred(x,y)=128(x=0 to 15,y=0 to 15)
[On the Upper Edge of the Picture]Pred(x,y)={ΣP(−1,y)+8}>>4(x=0 to 15,y=0 to 15)
[On the Left Edge of the Picture]Pred(x,y)={ΣP(x,−1)+8}>>4(x=0 to 15,y=0 to 15)
[Other Cases]Pred(x,y)={ΣP(x,−1)+ΣP(−1,y)+16}>>5(x=0 to 15,y=0 to 15)
<Prediction Mode “3” of 16×16 Pixel Block>Pred(x,y)={a+b×(x−7)+c×(y−7)+16}>>5a=16×{P(−1,15)+P(15,−1)}b=(5×H+32)>>6c=(5×V+32)>>6H=P(8,−1)−P(6,−1)+P(9,−1)−P(5,−1)+P(10,−1)−P(4,−1)+P(11,−1)−P(3,−1)+P(12,−1)−P(2,−1)−P(13,−1)−P(1,−1)+P(14,−1)−P(0,−1)+P(15,−1)−P(−1,−1)V=P(−1,8)−P(−1,6)+P(−1,9)−P(−1,5)+P(−1,10)−P(−1,4)+P(−1,11)−P(−1,3)+P(−1,12)−P(−1,2)−P(−1,13)−P(−1,1)+P(−1,14)−P(−1,0)+P(−1,15)−P(−1,−1)
A prediction mode having a smaller numerical value is more frequently selected for prediction.
For a color difference component, four prediction directions are used. The block size is 8×8 pixels. The prediction method using an 8×8 pixel block for color difference components is different from the prediction method using a 16×16 pixel block for brightness components in the prediction mode numbers and prediction directions. More specifically, in an 8×8 pixel block prediction for color difference components, prediction mode “0” is mean value prediction, prediction mode “1” is horizontal prediction, prediction mode “2” is vertical prediction, and prediction mode “3” is planar prediction.
The JVC is also developing H.264/AVC Fidelity Range Extensions Amendment including still image encoding. This amendment proposes an intra-frame prediction method using an 8×8 pixel block for brightness components. For the prediction method using an 8×8 pixel block for brightness components, nine prediction methods are defined depending on the prediction direction, like the prediction method using a 4×4 pixel block. The prediction modes and prediction directions are the same as in a 4×4 pixel block. Predicted values are also calculated in the same manner as in a 4×4 pixel block.
However, the recommendation does not define how to select a block size and a prediction direction to ensure efficient prediction for predictive intra-coding. Actually, a block size and a prediction direction to ensure a maximum efficiency may be determined based on the prediction results for all block sizes and all prediction directions. However, the determination method based on the prediction results for all block sizes and all prediction directions requires a large circuit scale and an enormous calculation amount. It is therefore difficult for this method to perform encoding in real time.
Alternatively, in accordance with a block unit (to be referred to as a prediction unit hereinafter) used for prediction in a block around the encoding target or the image information of the encoding target block, the prediction unit of the encoding target block may be determined in advance. Prediction may be performed in all prediction directions for the determined prediction unit, and a most efficient prediction direction may be selected. At this time, if prediction units for all prediction modes of a 4×4 pixel block are provided, a prediction unit for a prediction unit such as a 16×16 pixel block or 8×8 pixel block can be implemented by repetitively using the prediction units for a 4×4 pixel block. However, the above-described method requires a longer process time for prediction.
For example, for predictive intra-coding in prediction mode “0” of a 4×4 pixel block, predicted values are calculated as shown in FIG. 6. One cell in FIG. 6 represents one pixel. A, B, C, . . . in upper case denote reference pixel values, and a, b, c, . . . in lower case denote predictive encoding target pixels. Letting P(A), P(B), P(C), . . . be the brightness values of the reference pixel values A, B, C, . . . , predicted values Pred(a), Pred(b), Pred(c), . . . of the predictive encoding target pixels a, b, c, . . . are obtained in the following way.Pred(a)=Pred(e)=Pred(i)=Pred(m)=P(A)Pred(b)=Pred(f)=Pred(j)=Pred(n)=P(B)Pred(c)=Pred(g)=Pred(k)=Pred(o)=P(C)Pred(d)=Pred(h)=Pred(l)=Pred(p)=P(D)
For intra-coding in prediction mode “0” of a 16×16 pixel block, predicted values are calculated as shown in FIG. 7. The coordinates of the pixel at the upper left corner of the pixel block shown in FIG. 7 are defined as (0,0). The pixel value of the pixel p(0,0) is represented by P(0,0). The predicted value Pred(x,y) of the pixel p(x,y) is obtained by Pred(x,y)=P(x,−1) (x=0 to 15, y=0 to 15)
At this time, predicted values for a 16×16 pixel block are calculated using the predicted value calculation units for intra-coding of a 4×4 pixel block. If P(A)=P(m,−1), P(B)=P(m+1,−1), P(C)=P(m+2,−1), and P(D)=P(m+3,−1) (m=0, 4, 8, and 12), the predicted values areP(m,n)=P(m,n+1)=P(m,n+2)=P(m,n+3)=P(A)P(m+1,n)=P(m+1,n+1)=P(m+1,n+2)=P(m+1,n+3)=P(B)P(m+2,n)=P(m+2,n+1)=P(m+2,n+2)=P(m+2,n+3)=P(C)P(m+3,n)=P(m+3,n+1)=P(m+3,n+2)=P(m+3,n+3)=P(D)(m,n)={(0,0),(0,4),(0,8),(0,12),(4,0),(4,4),(4,8),(4,12),(8,0),(8,4),(8,8),(8,12),(12,0),(12,4),(12,8),(12,12)}
To implement prediction in prediction mode “0” of a 16×16 pixel block using the prediction units in prediction mode “0” of a 4×4 pixel block, the prediction units must be used 16 times repetitively.