1. Technical Field
The present invention relates to coding for compression of moving picture data, and in particular to a moving picture coding method, a moving picture decoding method, and apparatuses of the same which achieve high coding efficiency.
2. Background Art
Moving picture data is being adopted in increasing numbers of applications, ranging from video-telephoning and video-conferencing to DVD and digital television. When moving picture data is being transmitted or recorded, a substantial amount of data has to be sent through conventional transmission channels of limited available frequency bandwidth or has to be stored on conventional storage media of limited data capacity. In order to transmit and store digital data on conventional channels and media, it is inevitable to compress or reduce the volume of digital data.
For the compression of moving picture data, a plurality of moving picture coding standards has been developed. Such moving picture standards are, for instance, the ITU-T standard denoted by H.26x and ISO/IEC standards denoted by MPEG-x. The letter “x” represents for example, 1, 2, and 4. The most up-to-date and advanced moving picture coding standard is currently the standard denoted as H.264/MPEG-4 AVC.
The coding approach underlying most of these standards consists of the following main stages:
(a) Partitioning each individual video frame into blocks which include pixels, in order to subject each video frame to data compression at a block level.
(b) Scanning each block of moving picture data according to a fixed scanning scheme that defines the order in which the blocks will be coded.
(c) Predicting each scanned block by exploiting either temporal dependencies between blocks of subsequent frames (motion compensation) or spatial dependencies between the current block and previously coded blocks of the same frame (intra-frame prediction).
(d) Computing a residual between the scanned block and its prediction, and coding the residual of each block.
It is a particular approach of current moving picture coding standards that the image information is transformed from the spatial domain into the frequency domain. Image compression is achieved by representing the image content by only a few frequency components. A natural image content is mostly concentrated in the coefficients of the lower frequency domain. Higher frequency parts, for which the human eye is less sensitive anyway, can thus be removed or quantized in order to lower the amount of data to be coded.
In many applications, the volume or bandwidth available for storing or transmitting coded moving picture data is seriously restricted. There is thus the urgent need to compress the moving picture data as far as possible. However, increasing data compression rate by quantizing even more coarsely in order to reduce the amount of data inevitably leads to a deterioration of image quality.
FIG. 1 is a block diagram illustrating a structure of a moving picture coding apparatus of the prior art. A moving picture coding apparatus 100 includes a subtractor 110, a transform and quantization unit 120, an inverse quantization and inverse transform unit 130, an adder 135, a deblocking filter 137, a memory 140, an intra-frame prediction unit 150, a motion compensation prediction unit 160, a motion estimator unit 170, an intra/inter switch 180, and an entropy coding unit 190. The subtractor 110 calculates a difference between a current block to be coded and a prediction signal which is based on a previously coded block stored in the memory 140. The transform and quantization unit 120 transforms the prediction error obtained from the subtractor 110 from the spatial domain to the frequency domain and quantizes the obtained transform coefficients. The entropy coding unit 190 entropy codes the quantized transform coefficients.
In accordance with the H.264/AVC standard, the input image is partitioned into macroblocks. The moving picture coding apparatus 100 only transmits differences between blocks of an input moving picture sequence and their predictions based on previously coded blocks (“the locally decoded image”). These differences are determined in the subtractor 110, which receives the blocks to be coded in order to subtract the prediction signal therefrom.
The locally decoded image is provided by a local decoding unit (the inverse quantization and inverse transform unit 130, the adder 135, and the deblocking filter 137) incorporated into the moving picture coding apparatus 100. The local decoding unit performs the coding steps in reverse manner. The inverse quantization and inverse transform unit 130 dequantizes the quantized coefficients and applies an inverse transform to the dequantized coefficients. In the adder 135, the differences obtained by the inverse transform are added to the prediction signal to form the locally decoded image. Further, the deblocking filter 137 reduces block noise in the decoded image.
The type of prediction that is employed by the moving picture coding apparatus 100 depends on whether the macroblocks are coded in “Intra” or “Inter” mode. In “Intra” mode, the moving picture coding standard H.264/AVC uses a prediction scheme based on already coded macroblocks of the same image in order to predict subsequent macroblocks. In “Inter” mode, motion compensation/prediction between corresponding blocks of several consecutive frames is employed.
Only Intra-coded images (I-pictures) can be decoded without reference to any previously decoded image. The I-pictures provide error propagation resilience for the coded moving picture sequence. Further, entry points into bit streams of coded data are provided by the I-pictures in order to enable a random access, that is, to access I-pictures within the coded moving picture sequence. A switch between Intra-mode, that is, a processing by the intra-frame prediction unit 150, and Inter-mode, that is, a processing by the motion compensation prediction unit 160, is controlled by the intra/inter switch 180.
In “Inter” mode, a macroblock is predicted from blocks of previous frames by employing motion compensation. The motion prediction is accomplished by the motion estimator unit 170, receiving the current input signal and the locally decoded image. Motion estimation yields two-dimensional motion vectors which represent a pixel motion between the current block and the corresponding block in previous frames. Based on the estimated motion, the motion compensation prediction unit 160 provides a prediction signal.
For both the “Intra” and the “Inter” coding modes, the differences between the current and the predicted signal are transformed into transform coefficients by the transform and quantization unit 120. Generally, an orthogonal transform such as a two-dimensional Discrete Cosine Transform (DCT) or an integer version thereof is employed.
The transform coefficients are quantized in order to reduce the amount of data that has to be coded. The step of quantization is controlled by quantization tables that specify the accuracy and the number of bits that are used to code each frequency coefficient. Lower frequency components are usually more important for image quality than fine details so that more bits are spent for coding the low frequency components than for the higher ones.
For reconstructing the coded images by a decoding apparatus, the coding process is applied in reverse manner.
FIG. 2 is a block diagram illustrating a structure of a moving picture decoding apparatus of the prior art. A moving picture decoding apparatus 200 includes an entropy decoding unit 210, an inverse quantization and inverse transform unit 220, an adder 230, a deblocking filter 240, a memory 250, an intra-frame prediction unit 260, and a motion compensation prediction unit 270.
In the entropy decoding unit 210, entropy coding of transform coefficients and motion data by the entropy coding unit 190 is reversed. The entropy decoded block is then submitted to the inverse quantization and inverse transform unit 220 and the entropy decoded motion data is sent to the motion compensation prediction unit 270. The result of the inverse quantization and inverse transform contains prediction errors. The prediction errors are added by the adder 230 to the prediction signal stemming from the motion compensation prediction unit 270 in Inter-mode or stemming from the intra-frame prediction unit 260 in Intra-mode. The reconstructed image is passed through the deblocking filter 240 and is stored in the memory 250 to be used by the intra-frame prediction unit 260 and the motion compensation prediction unit 270.
According to the H.264/AVC standard, an image is partitioned into non-overlapping macroblocks of 16×16 pixels size. These macroblocks may be further partitioned into 4-by-4 blocks of 4×4 pixels size or into 2-by-2 blocks of 8×8 pixels size. These macroblocks together with the unpartitioned 16×16 pixel size block are then referred to as I4MB (Macro Block), I8MB, and I16MB, respectively.
The above described coding and decoding mechanisms are applied to each block separately. Consequently, the 2-dimensional arrangement of blocks has to be converted into a 1-dimensional sequence in which the blocks will be handled by the coding apparatus and the decoding apparatus. In other words, the blocks have to be scanned according to a certain scanning scheme that defines the order in which the blocks will be processed.
FIG. 3 is a diagram illustrating a scanning scheme for blocks in accordance with the H.264/AVC standard. Arrows indicate the order in which blocks of an I8MB macroblock (310) and an I4MB macroblock (320) are scanned. Numerals 0 to 15 in the figure give the order in which the 4×4 pixel blocks of the I4MB macroblock (320) are scanned. Reference 350 indicates individual pixels of the blocks.
The order in which the blocks are scanned is particularly important for intra-coded blocks, that is, for blocks where spatial correlations are exploited in order to reduce the amount of information that has to be coded. As described above, intra-coded blocks are predicted from adjacent pixels of already coded blocks. This prediction value is subtracted from the actual block and only the residual is coded. Hence, prediction accuracy is crucial for a high compression ratio.
FIG. 4A is a diagram illustrating the intra-prediction of a macroblock as specified in the H.264/AVC standard. In the figure, the shaded area represents already coded blocks, whereas non-shaded area represents blocks yet to be coded. The 4×4 pixels of the current block 6 have to be extrapolated from adjacent pixels of already coded blocks. The 13 pixels (430) made up of one pixel of a block 1, four pixels of a block 3, four pixels of a block 4, four pixels of a block 5 are employed to predict the current block by replicating the corresponding pixel values in a certain prediction direction (440).
FIG. 4B summarizes the nine different prediction modes defined in the H.264/AVC standard. Modes 0, 1, and 3 to 8 are characterized by the prediction direction in which the reference pixels (430) are replicated into the current block. For example, in mode 1, four pixels of the block 4 are replicated as prediction values of the 4×4 pixels of the current block. Mode 2 (DC) employs the average of the reference pixels to fill the current block homogeneously, as a prediction value of each pixel of the current block.
Non-Patent Reference 1: ITU-T Rec.H264|ISO/IEC 14496-10 version 1 “Information technology—Coding of audio-visual objects—Part 10: Advanced video coding”
However, according to the above described techniques of the prior art, there is a problem that it is not possible to improve the spatial prediction accuracy in the intra-prediction. This results in a problem that the coding efficiency cannot be improved either. With reference to FIGS. 5A and 5B, these problems shall be described concretely. In the figure, the shaded area represents previously coded blocks, whereas non-shaded area represents blocks yet to be coded.
As it is apparent from the scanning order and the examples shown in FIGS. 5A and 5B, not all reference pixels are available for all blocks. In FIG. 5A, for example, a block 3 has to be predicted without the four reference pixels (540) of the block 4, because the block 4 has not yet been coded. This impairs intra-prediction modes 3 and 7, and leads to a higher prediction error. Hence, more bits have to be spent to code the residual, and the prediction accuracy and the coding efficiency degrade. Further, the higher prediction error leads to lower image quality.
FIG. 5B shows another example of missing reference pixels due to the scanning order. A block 13 has to be coded without four reference pixels (540) that are part of the next macroblock. Consequently, the prediction accuracy is impaired, leading to higher prediction errors and degraded performance of the coding apparatus.
Another problem of the H.264/AVC standard is related to the set of available prediction modes. This set of prediction modes is asymmetric in the sense that prediction direction is primarily diagonally down right. In other words, there is, for instance, no horizontal left or vertical up prediction mode. Obviously, this asymmetry is due to the primary scanning direction of the H.264/AVC standard, which is also diagonally down right. It is well conceivable that depending on the image content, certain prediction directions other than those shown in FIG. 4B may deliver superior prediction accuracy. However, due to the fixed scanning scheme of H.264/AVC standard, no such improvement can be realized.
An object of the present invention is to provide a moving picture coding method, a moving picture decoding method, and apparatuses of the same that improve the spatial prediction accuracy in intra-prediction.