1. Field of the Invention
The present invention relates to a method for decoding an image block, and more particularly to a method for decoding an image block which includes padding for a block of a base layer corresponding to an image block of an enhanced layer when an image signal is scalably decoded.
2. Description of the Prior Art
It is difficult to allocate a broadband available for TV signals to wirelessly transmitted/received digital image signals wirelessly transmitted/received from/in a portable phone and a notebook computer, which have been extensively used, and a mobile TV and a hand held PC, which are expected to be extensively used in the future. Accordingly, a standard to be used for an image compression scheme for such portable devices must enable an image signal to be compressed with a relatively high efficiency.
In addition, such portable mobile devices are equipped with various processing and presentation capabilities. Accordingly, compressed images must be variously prepared corresponding to the capabilities of the portable devices. Therefore, the portable devices must be equipped with image data having various qualities obtained through the combination of various parameters including the number of transmission frames per second, resolution, and the number of bits per pixel with respect to one image source, burdening content providers.
For this reason, the content provider prepares compressed image data having a high bit rate with respect to one image source so as to provide the portable devices with the image data by decoding the compressed image and then encoding the decoded image into image data suitable for an image processing capability of the portable devices requesting the image data. However, since the above-described procedure necessarily requires trans-coding (decoding+scaling+encoding), the procedure causes a time delay when providing the image requested by the portable devices. In addition, the trans-coding requires complex hardware devices and algorithms due to the variety of a target encoding.
In order to overcome these disadvantages, there is suggested a Scalable Image Codec (SVC) scheme. According to the SVC scheme, an image signal is encoded with a best image quality in such a manner that the image quality can be ensured even though parts of the overall picture sequences (frame sequences intermittently selected from among the overall picture sequences) derived from the encoding are decoded.
A motion compensated temporal filter (or filtering) (MCTF) is an encoding scheme suggested for the SVC scheme. The MCTF scheme requires high compression efficiency, that is, high coding efficiency in order to lower the number of transmitted bits per second because the MCTF scheme is mainly employed under a transmission environment such as mobile communication having a restricted bandwidth.
As described above, although it is possible to ensure image quality even if only a part of the sequence of a picture encoded through the MCTF, which is a kind of the SVC scheme, is received and processed, image quality may be remarkably degraded if a bit rate is lowered. In order to overcome the problem, an additional assistant picture sequence having a low transmission rate, for example, a small-sized image and/or a picture sequence having the smaller number of frames per second may be provided.
The assistant picture sequence is called a base layer, and a main picture sequence is called an enhanced (or enhancement) layer. The enhanced layer has a relative relationship with the base layer. Herein, several base layer having different resolution may be provided. For example, on an assumption that image resolution of the enhanced layer has resolution of 4CIF (4 times common intermediate format), a first base layer having resolution of CIF and a second base layer having resolution of QCIF (Quarter CIF) may be provided with the enhanced layer.
When comparing image resolutions or image sizes with each other, the 4 CIF is four times the CIF or 16 times the QCIF based on the number of overall pixels or an area occupied by overall pixels when the pixels are arranged with the same interval in right left directions. In addition, based on the number of pixels in a width direction and a length direction, the 4CIF becomes twice of the CIF and four times the QCIF. Hereinafter, the comparison of the image resolution or the image sizes is achieved based on the number of pixels in width and length directions instead of the area or the number of the overall pixels, so that the resolution of the CIF becomes a half the 4CIF and twice the QCIF.
Since layers having different resolution are obtained by encoding the same image contents with different spatial resolution and frame rates, redundancy information exists in data streams encoded for the layers. Accordingly, in order to improve coding efficiency of a predetermined layer (e.g., the enhanced layer), there is suggested an inter-layer prediction scheme for predicting an image signal of the predetermined layer (the enhanced layer) using a data stream encoded for an image signal of a layer (e.g., the base layer) having relatively lower resolution as compared with that of the predetermined layer.
The inter-layer prediction scheme includes an intra texture prediction scheme, a residual prediction scheme, and a motion prediction scheme.
In the intra texture scheme, a prediction image for a predetermined macro block of the enhanced layer is created in an intra base mode (intra_BASE_mode) based on a corresponding block of the base layer (this corresponding block has a relative position identical to that of the predetermined macro block in a frame) encoded in an intra mode.
In the residual prediction scheme, an additional prediction operation is performed with respect to the prediction image of the enhanced layer generated for a main picture sequence based on a prediction image of a base layer generated for an assistant picture sequence. Since the prediction image denotes an image having image difference values, that is, residual data, obtained through a prediction operation for the macro block, the macro block is encoded into a block having difference values of residual data through the residual prediction scheme.
In the motion prediction scheme, a motion vector of a picture of the enhanced layer temporally simultaneous with a motion vector of a picture of the base layer is encoded based on based on the motion vector of the base layer.
FIG. 1 is block diagram illustrating the structure of a scalable codec employing scalability according to temporal, spatial, and SNR or quality aspects based on a ‘2D+t’ structure.
One image source is encoded by classifying several layers having different resolutions including an image signal with an original resolution (an image size), an image signal with half original resolution, and an image signal with a quarter original resolution. In this case, the same encoding scheme or different encoding schemes may be employed for the several layers. The present invention employs an example in which the layers are individually encoded through the MCTF scheme.
In order to use redundancy information between layers, a block in a predetermined layer may be encoded into a block having difference values of residual data or into an intra base mode block by using a block in a data stream encoded for a layer having resolution lower than that of the predetermined layer (having an image size smaller than that of the predetermined layer). In addition, motion information relating to motion estimation generated through the MCTF may be used in order to predict motion information of a layer having relatively higher resolution.
In the meantime, the intra texture prediction scheme generating the intra base mode block may be applied to a case in which a corresponding block of the base layer is positioned in a block encoded in the intra mode. Herein, the corresponding block is temporally simultaneous with a macro block in the enhanced layer for a prediction image to be found and has a relative position identical to that of the macro block in a frame. In addition, when the macro block in the enhanced layer is encoded in an intra base mode, the corresponding block of the base layer encoded in the intra mode is reconstructed to an original image based on pixel values of another area for the intra mode, the size of the reconstructed corresponding block is enlarged to the size of the macro block through a padding process and an interpolation process using a de-blocking filter and a half-pel interpolation filter for the reconstructed corresponding area, and then, a prediction operation is performed. Through the padding process, the corresponding block of the base layer encoded in the intra mode is enlarged by three pixels in up, down, right, and left directions.
FIG. 2 is a view illustrating an example in which pixels are padded in the upper direction of an up boundary of a corresponding block (this block is encoded in an intra mode and reconstructed into an original image) and a right-up boundary, which extends in the right direction from the up boundary. If other boundaries such as right and down-right boundaries, down and left-down boundaries, and left and up-left boundaries shown in FIG. 2 match with the up and right-up boundaries if they are sequentially rotated by 90 degrees, so that the pixels can be padded on the above boundaries in the same manner.
Padding areas provided in the upper direction of the up boundary and the right-up boundary of a 16×16-sized block I_MBO (which has been encoded in the intra mode and is reconstructed into an original image) may be divided into A, B, C, and D blocks. The down boundaries of the blocks of A, B, and C make contact with the up boundary of the I_MBO. However, in the 3×3-sized D block, only the most left-down pixel makes contact with the most right-up pixel of the I_MBO in a diagonal direction thereof. The left boundary of the 4×3-sized A block is in line with the left boundary of the I_MB0, and the right boundary of the 4×3-sized C block is in a line with the right boundary of the I_MB 0. In the 8×3-sized B block, the left boundary and the right boundary thereof are positioned between the left boundary and the right boundary of the I_MB 0. In the D block, it may be represented that the down boundary of thereof makes contact with a boundary generated by extending the up boundary of the I_MB 0 in a right direction.
Hereinafter, schemes of padding data in pixels of the A, B, C, and D blocks will be sequentially described.
A coordinate (a position) of a pixel is represented as [x,y], and a pixel value is represented as p[x,y]. It is assumed that the most left-up pixel of the I_MB0 is [0,0]. A pixel of the A block may be represented as [i,−j], wherein i=0, . . . , 3, j=1, 2, 3. In the following description, a pixel value p[x,y] does not indicate a harmonic component of an encoded image, but indicates a pixel value of a reconstructed original image.
I) If the A block is positioned at a block encoded in the intra mode, a pixel value in the A block is maintained.
If the A block is not positioned at the block encoded in the intra mode, it is determined whether or not [−1, −1] is positioned at an internal block, that is, the internal block exists in the left of the A block.
II) If [−1,−1] is positioned in the intra block of I_MB1 as the determination result, each pixel value p[i,−j] in the A block is calculated through a predetermined scheme based on pixels ([i, 0], i=0, 1, 2, 3) of the up boundary of the I_MB0, pixels ([−1, −j], j=1, 2, 3) of the right boundary of the I_MB 1, and a pixel ([−1, 0]) making contact with the most left-down pixel of the A block in a diagonal direction. For example, an intra 8×8 diagonal down right prediction scheme of the H.264 shown in FIG. 2 may be employed for calculating each pixel value of p[i,−j].
In this case, if the pixel [−1, 0] used for calculating each pixel value p[i,−j] in the A block is positioned in the intra block, the value of the pixel [−1, 0] is maintained. In contrast, if the pixel [−1, 0] is not positioned in the intra block, the value of the pixel [−1,0] is calculated by p[−1,0]−(p[−1,−1]+p[0,0]+1)/2.
In contrast, III) if a pixel of [−1,−1] is not positioned at the intra block I_MB1 as the determination result, p[i,−j] becomes p[i,0], wherein i=0, 1, 2, 3, j=1, 2, 3. That is, each pixel of the A block is padded with a value identical to that of a pixel on the up boundary of the I_MB 0 having the same x-axis coordinate.
Hereinafter, a scheme for padding data at a pixel [i,−j] in the B block will be described, wherein i=4, . . . , 11, j=1, 2, 3.
If the B block is positioned at a block encoded in the intra mode, pixel values in the B block are maintained as they are.
In contrast, if the B block is not positioned at the block encoded in the intra mode, p[i,−j] becomes p[i,0] similarly to the iii) case of the A block, wherein i=4, . . . , 11, j=1, 2, 3. That is, each pixel of the B block is padded with a value identical to that of a pixel on the up boundary of the I_MB0 having the same x-axis coordinate.
Hereinafter, a scheme for padding data in a pixel ([i−j], i=12, . . . , 15, j=1, 2, 3) in the C block will be described. Similarly to the A block, the C block is padded.
If the C block is positioned in a block encoded in the intra mode, pixel values in the C block are maintained as they are.
If the C block is not positioned in the block encoded in the intra mode, it is determined whether or not a pixel [16, −1] is positioned in the intra block, that is, the intra block exists in the right of the C block.
If a pixel [16,−1] is positioned at the intra block I_MB2 as the determination result, each pixel value p[i−j] in the C block is calculated through a scheme (in which a pixel coordinate moves along only a diagonal axis) similar to the predetermined scheme used when the block A is padded based on a pixel ([i,0], i=12, 13, 14, 15) on the up boundary of the I_MB0, a pixel ([16,−j], j=,1, 2, 3) on the left boundary of the I_MB2, a pixel ([16,0]) making contact with the most right-down pixel of the C block.
In this case, if a pixel ([16,0]) used for calculating each pixel value p[i,−j] in the C block is positioned at the intra block, the pixel value of [16,0] is maintained as it is. If the pixel ([16,0]) is not positioned in the intra block, p[16, 0] is equal to (p[15,0]+p[16,−1]+1)/2.
In contrast, if the [16,−1] is not positioned in the I_MB2, p[i,−j]=p[i, 0] as the determination value, wherein i=12, 13, 14, 15, j=1, 2, 3. That is, each pixel of the C block is padded with a value identical to that of a pixel on the up boundary of the I_MB 0 having the same x-axis coordinate.
Hereinafter, a scheme of padding data in a pixel ([i,−j], i=16, 17, 18, j=1, 2, 3) will be described.
If the D block is positioned in a block encoded in the intra mode, pixel values in the D block are maintained as they are.
However, if the D block is not positioned at a block encoded in the intra mode, each pixel in the D block is padded with the same value as the most right-up pixel ([15,0]) of the I_MB0 making contact with the D block in a diagonal direction. That is, p[i,−j]=p[15,0], wherein i=16, 17, 18, j=1, 2, 3.
Although the scheme for padding data in the pixel of the D block is simple, if the D block is not positioned at a block encoded in the intra mode, the pixel of the D block is not precisely padded.