In the age of multimedia which integrally handles audio, video and other pixel values, existing information media, specifically, newspaper, magazine, television, radio, telephone and the like through which information is conveyed to people, have recently come to be included in the scope of multimedia. Generally, multimedia refers to something that is represented by associating not only characters, but also graphics, sound, and especially images and the like, together, but in order to include the aforementioned existing information media in the scope of multimedia, it becomes a prerequisite to represent such information in a digital form.
However, if the amount of information carried by each of the mentioned information media is estimated as the amount of digital information, while the amount of information for 1 character in the case of text is 1 to 2 bytes, the amount of information required for sound is 64 Kbits per second (telephone quality), and 100 Mbits or over per second becomes necessary for moving images (current television receiving quality), it is not realistic for the information media to handle such an enormous amount of information as it is in digital form. For example, although video phones are already in actual use via Integrated Services Digital Network (ISDN) which offers a transmission speed of 64 kbps to 1.5 Mbps, it is impossible to transmit images on televisions and images taken by cameras directly through ISDN.
Accordingly, information compression techniques have become required, and for example, in the case of the video phone, the H.261 and H.263 standards for moving image compression technology, internationally standardized by the International Telecommunication Union—Telecommunication Standardization Sector (ITU-T), are being employed. Moreover, with MPEG-1 standard information compression techniques, it has also become possible to store video information onto general music compact discs (CD) together with audio information.
Here, a Moving Picture Experts Group (MPEG) is an international standard for a moving image signal digital compression. The MPEG-1 is a standard for compressing moving image signals up to 1.5 Mbps, in other words, compressing television signals up to approximately a hundredth of the original size. Moreover, since target picture quality within the scope of the MPEG-1 standard is limited to a medium degree of quality which can be realized by a transmission speed of primarily about 1.5 Mbps, the use of MPEG-2, which was standardized to satisfy demands for further improved picture quality, realizes television broadcasting quality with moving image signals compressed to 2 to 15 Mbps.
Furthermore, a MPEG-4 having a higher compression ratio has been standardized by a working group (ISO/IEC JTC1/SC29/WG11) which has pursued standardizations of MPEG-1 and MPEG-2. The MPEG-4 not only enables an efficient coding with a low bit rate, but also introduces a strong error tolerance technique which can reduce subjective picture quality degradation even when a transmission line error has occurred. In addition, currently, as a next generation screen coding method, a standardization of H.264 has been pursued by a cooperation of ISO/IEC with ITU-T.
In general, in encoding of a moving image, the amount of information is compressed by reducing redundancy in temporal and spatial directions. Here, in an inter prediction encoding for reducing temporal redundancy, a motion estimation and a generation of a prediction image are performed on a block-by-block basis by referring to preceding and following images, and encoding is performed on a difference between the obtained prediction image and a block to be encoded. Further, in an inter prediction encoding for reducing spatial redundancy, a prediction image is generated from pixel information of a neighboring encoded block, and the encoding is performed on a difference between the obtained prediction image and the block to be encoded.
Here, a picture is a language indicating one screen. It indicates one frame when coded as a frame structure. Also, it indicates one field when coded as a field structure.
Each picture is divided into blocks called micro blocks, for example, each of which is horizontal 16×vertical 16 pixels, and is processed on a block-by-block basis. The picture of the field structure is encoded by including all micro blocks as a field macro block. On the other hand, the picture of the frame structure can be encoded not only by using all micro blocks as a frame macro block, but also by switching to a frame or a field on a basis of vertically sequential two macro blocks as one unit (a macro block pair).
FIG. 1 is a block diagram showing a structure of a moving image encoding apparatus which realizes a conventional moving image encoding method. The moving image encoding apparatus includes a picture memory 101, a prediction residual encoding unit 102, a bit stream generation unit 103, a prediction residual decoding unit 104, a deblocking unit 105, a picture memory 106, an encoding mode controlling unit 107, an inter prediction image generation unit 108 and an intra prediction image generation unit 109.
In a displaying order of the bit stream to be encoded, the bit stream is inputted to the picture memory 101 on a picture-by-picture basis, and the pictures are sorted in an encoding order. Further, each of the pictures is divided into macro blocks and the following processing is applied on each micro block.
There are mainly two types of encoding methods: an inter prediction encoding; and an intra prediction encoding. Here, it is explained first about the inter prediction encoding.
An input image signal read out from the picture memory 101 is inputted to a difference arithmetic unit 110, and a difference image signal obtained by calculating a difference with the prediction image signal that is an output of the inter prediction image generation unit 108 is outputted to the prediction residual encoding unit 102. The prediction residual encoding unit 102 performs image encoding processing such as frequency conversion and quantization so as to output a residual signal. The residual signal is inputted to the prediction residual decoding unit 104, and an image decoding processing such as inverse quantization and inverse frequency conversion is performed so as to output a residual decoded signal. A sum arithmetic unit 111 adds the residual decoded signal and the prediction image signal so as to generate a reconstructed image signal. The reconstructed image signal is processed for reducing distortion which occurs in a boundary between blocks that are divided when encoding is performed by the deblocking processing unit 105, before being stored as a reference picture into the picture memory 106.
On the other hand, the input image signal on a macro block-by-block basis read out from the picture memory 101 is also inputted to the inter prediction image generation unit 108. Here, targeting one picture stored in the picture memory 106 or encoded pictures, an image area which is closest to the input image signal is detected and outputted as a prediction image. The prediction image is used for generating a difference image signal in the difference arithmetic unit 110 and for generating a reconstructed image signal in the sum arithmetic unit 111.
The bit stream generation unit 103 performs variable length encoding on various encoded information outputted by the series of above processing so as to obtain a bit stream (moving image encoded data) to be outputted by the encoding processing.
While this flow of processing is an operation in the case where the inter prediction encoding is performed, it is switched to the intra prediction encoding by the switch 112. Hereafter, it is explained about the intra prediction encoding.
The input image signal read out from the picture memory 101 is inputted to the difference arithmetic unit 110, and the difference image signal obtained by calculating a difference with the prediction image signal that is an output of the intra prediction image generation unit 109 is outputted to the prediction residual encoding unit 102. The prediction residual encoding unit 102 performs image encoding processing such as frequency conversion and quantization so as to output a residual signal. The residual signal is inputted to the prediction residual decoding unit 104, and the image decoding processing such as inverse conversion and inverse frequency conversion is performed so as to output the residual decoded signal. The sum arithmetic unit 111 adds the residual decoded signal and the prediction image signal, and generates a reconstructed image signal. The reconstructed image signal is processed for reducing distortion which occurs in a boundary between blocks that are divided when the deblocking processing unit 104 performs encoding.
On the other hand, the input image signal read out from the picture memory 101 on a macro block-by-block basis is also inputted to the intra prediction image generation unit 109. Here, a prediction image is generated by referring to the reconstructed image signal of one or more neighboring blocks in the same picture generated as an output of the sum arithmetic unit 111. The prediction image is used for generating a difference image signal in the difference arithmetic unit 110 and for generating a reconstructed image signal in the sum arithmetic unit 111.
The bit stream generation unit 103 performs variable length encoding on various encoded information outputted by the series of processing so that a bit stream outputted by the encoding processing is obtained.
Each encoding mode of the inter prediction encoding and the intra prediction encoding is controlled by the encoding mode control unit 107 and switched on a macro block-by-block basis.
FIG. 2 is a block diagram showing a structure of a moving image decoding apparatus which realizes a conventional moving image decoding method. The moving image decoding apparatus includes a bit stream analyzing unit 201, a prediction residual decoding unit 202, a deblocking unit 203, a picture memory 204, a decoding mode controlling unit 205, an inter prediction image generation unit 206 and an intra prediction image generation unit 207.
First, the bit stream analyzing unit 201 extracts various information from the inputted bit stream (moving image encoded data), and the information relating to a decoding mode and the residual coded signal are respectively outputted to the decoding mode controlling unit 205 and the prediction residual decoding unit 202.
There are two types of decoding methods: an inter prediction decoding; and an intra prediction decoding. Here, it is explained first about an inter prediction decoding.
The prediction residual decoding unit 202 performs image decoding processing such as inverse quantization and inverse frequency conversion on the inputted residual encoded signal, and outputs the residual decoded signal. The sum arithmetic unit 208 adds the residual decoded signal and the prediction image signal outputted from the inter prediction image generation unit 206, and generates a decoded image signal. The deblocking unit 203 performs processing for reducing distortion which occurs in a boundary between blocks on the decoded image signal before being stored into the picture memory 204 as a picture for reference or display.
On the other hand, the inter prediction image generation unit 206 takes out a specified image area from one or more decoded pictures stored in the picture memory 204, and generates a prediction image. The prediction image is used for generating a decoded image signal by the sum arithmetic unit 208.
The decoded image generated by the series of processing is outputted as an image signal for display from the picture memory 204 according to the timing to be displayed.
While the flow of processing is an operation in the case where the inter prediction decoding is performed, it is switched to the intra prediction decoding by the switch 209. Hereafter, it is explained about the intra prediction encoding.
The prediction residual decoding unit 202 performs image decoding processing such as inverse quantization and inverse frequency conversion on the inputted residual encoded signal, and outputs a residual decoded signal. The sum arithmetic unit 208 adds the residual decoded signal and the prediction image signal outputted from the intra prediction image generation unit 207, and generates a decoded image signal. The deblocking processing unit 203 performs processing for reducing distortion which occurs in a boundary between blocks on the decoded image signal before being stored in the picture memory 204 as a picture for display.
On the other hand, the intra prediction image generation unit 207 generates a prediction image by referring to a decoded image signal of one or more neighboring blocks in the same picture generated as an output of the sum arithmetic unit 208. The prediction image is used for generating the decoded image signal in the sum arithmetic unit 208.
The decoded image generated by the series of processing is outputted as an image signal for display from the picture memory 204 according to the timing to be displayed.
Note that, each of the decoding modes of the inter prediction decoding and the intra prediction decoding is controlled by the decoding mode controlling unit 205 and switched on a macro block-by-block basis.
Next, it is explained in detail about processing in the deblocking units 105 and 203. Here, the processing details in the encoding processing and the processing details in the decoding processing are precisely identical to each other. Therefore, they are explained together as the same processing.
FIGS. 3A and 3B are drawings for explaining a method of determining types of filters used for deblocking. Here, as an example, assuming that there are five types of filters, the filters are used by being switched according to a characteristic of a block boundary. It is structured to apply a stronger filter (here indicates Filter 4) to a portion where has a higher possibility in which block distortion eminently occurs, and a weaker filter (here indicates Filter 0) to a portion where has a lower possibility in which block distortion eminently occurs.
FIG. 3A is a drawing showing a boundary between blocks to which filters are applied. In the drawing, the center line indicates a boundary between blocks; a pixel on the right side shown as Q indicates a pixel adjacent to the boundary in the target block; and the pixel on the left side shown as P indicates a pixel adjacent to the boundary in the adjacent block. FIG. 3B is a table showing which filters are selected with which conditions the pixel P and the pixel Q shown in FIG. 3B have. For example, the Filter 4 is selected in the case where the boundary is at a vertical edge and one of the pixels P and Q belongs to a block which is intra prediction encoded. Similarly, the Filter 3 is selected in the case where the boundary is at a horizontal edge and one of the pixels P and Q belongs to a block which is intra prediction encoded. Also, the Filter 2 is selected in the case where one of the pixels P and Q belongs to a block which has a coefficient other than 0 of a spatial frequency component converted by frequency conversion. Further, the Filter 1 is selected in the case where the pixels P and Q belong to a block which is inter prediction encoded and refer to respectively different pictures or different motion vectors. Furthermore, the Filter 0 is selected in the case where it is not applied to any of the above conditions.
Here, the table of FIG. 3B shows an example of a method of selecting filters. The number of filters and the selection conditions are not limited to the example. Therefore, the other cases can be similarly treated.
Next, a flow of deblocking processing is explained with reference to a flowchart shown in FIG. 4. The target data is managed in separated forms of data for luminance and data of chrominance. Therefore, deblocking is separately applied to each component.
First, in order to perform deblocking on luminance components, a loop processing is repeated as many as the number of pixels of the luminance component adjacent to a target block boundary (F1 and F4); a type of filter explained with reference to FIG. 3 is selected in each loop (F2); and the filter is applied (F3). The information of the type of the selected filter herein is stored in the memory region in which the information can be seen in later processing, while being used for applying filtering on the target pixel of the luminance component (F5). In order to target, for each pixel, a boundary which is at a vertical edge on the left side and a boundary which is a horizontal edge on the upper side, the above mentioned processing is applied eight times in the case of a block which is, for example, made up of horizontal 4×vertical 4 pixels.
Next, in order to perform deblocking on chrominance components, a loop processing is turned as many as the number of pixels of the chrominance components adjacent to the target block boundary (F6 and F10); a type of filter is selected in each loop (F8); and the selected filter is applied (F9). Herein, a filter to be applied is determined according to the type of the filter used for the luminance component. Specifically, the type of the filter applied at a position of a pixel of the corresponding luminance component is referred to and used from the memory region in which the information about the type of filter determined in the processing for the luminance component is stored. Here, the following equations are used for converting the position of a target pixel of the chrominance component to a position of a corresponding pixel of the luminance component (F7). Note that, XL indicates a horizontal coordinate value of the luminance, XC indicates a horizontal coordinate value of the chrominance, YL indicates a vertical coordinate value of the luminance, and YC indicates a vertical coordinate value of the chrominance.XL=2×XC  (equation 1(a))YL=2×YC  (equation 1(b))
Thus, deblocking is performed on the chrominance component by applying a filter determined by the above mentioned processing.
Next, it is explained about a relationship between the luminance components and the chrominance components. FIGS. 5A-5C are drawings for explaining positional relationships between the luminance components and the chrominance components. In the drawings, x mark indicates a sample position of the luminance component and O mark indicates a sample position of the chrominance component.
In general, eyes of humans are insensitive to changes of the chrominance components. Therefore, it is more likely that the color components are decimated for use. While there are various decimation methods, FIG. 5A indicates a positional relationship in the case where the chrominance components are decimated to a half amount in both vertical and horizontal directions. FIG. 5B indicates a positional relationship in the case where chrominance components are decimated to a half amount only in a horizontal direction. FIG. 5C indicates a positional relationship in the case where decimation is not performed. In the case of the positional relationship as shown in FIG. 5A, the equation 1(a) and the equation 1(b) are used for calculating a pixel position of the corresponding luminance component for deblocking the chrominance component.
Further, FIGS. 6A-6C show a positional relationship in a frame structure and in a field structure in the case where the chrominance components are decimated to a half amount in both vertical and horizontal directions. FIG. 6A shows a frame structure when processing is performed after decimating the chrominance components. FIG. 6B shows a field structure which is replaced to the frame structure. Specifically, zeroth, second and fourth lines of the luminance components are assigned to a top field and first, third, and fifth lines are assigned to a bottom field. (Refer to: ITU-T Rec. H.264|ISO/IEC 14496-10 AVC Draft Text of Final Draft International Standard (FDIS) of Joint Video Specification (2003-3-31)).
However, in the conventional structure, the type of a filter used for the luminance component at a pixel position converted using the equation 1(a) and the equation 1(b) is applied to a pixel of the chrominance component. Therefore, there was a problem of causing inconsistency that a filter to be applied to the chrominance component is determined by referring to the luminance component in the top field, in the case where an image to be displayed in an interlaced-scan form is encoded and decoded in the frame structure. FIGS. 7A and 7B are drawings for explaining the reference relationship therein. FIG. 7A shows a positional relationship between the luminance components and the chrominance components when the picture is encoded and decoded in the frame structure. FIG. 7B shows a positional relationship between the luminance components and the chrominance components when the image is replaced to the field structure. Here, L_0 indicates a position of the luminance components at zeroth line, and C_0 indicates a position of the chrominance components at zeroth line. It is indicated that the luminance component at L_2 is referred by the equation 1(b) when a deblocking filter is applied to the color component of C_1. However, when the pixel is replaced to the field structure, it is found that a type of filter is determined by which the chrominance at C_1 in the bottom field refers to the luminance component at L_2 in the top field.
As described above, in the picture having the frame structure, all macro blocks can be encoded not only as frame macro blocks but also by switching to the frame structure or the field structure on a macro block pair-by-pair basis. Also, in the case where the picture is encoded in the field structure, it is possible to use respective encoding modes for the top field and the bottom field.
Accordingly, for example, in the case where an intra prediction encoding mode is used in the top field and an inter prediction encoding mode is used in the bottom field, picture quality is degraded in the chrominance components in the bottom field. In other words, basically, a strong filter is adapted for the intra prediction encoding mode and a weak filter is adapted for the inter prediction encoding mode. Therefore, an originally weak filter should be applied to the chrominance components in the bottom field. However, as described in the above, a type of filter for the chrominance components in the bottom field is determined by referring to the luminance components in the top field. Therefore, a strong filter is adapted. Consequently, the picture quality is degraded in the chrominance components in the bottom field so that the image is not consistent when it is displayed in the interlaced-scan form.
Further, the same thing is also applied to the case where, even if a same encoding mode is used for the top field and the bottom field of the target macro block, an adjacent macro block is encoded in the field structure and respective encoding modes are used for the top field and the bottom field.
As described in the above, in the case where the image to be displayed in the interlaced-scan form is encoded and decoded in the frame structure, there is a problem that an inappropriate type of filter is applied because there is a case where the type of filter applied to the chrominance components is determined by referring to the luminance components in a different field.