1. Field of the Invention
This invention relates to a picture decoding method and apparatus for decoding compressed picture data of a first resolution obtained on predictive coding by motion prediction in terms of a pre-set pixel block (macro-block) as a unit and on performing orthogonally-transform in terms of a pre-set pixel block (orthogonal transform block) as a unit. More particularly, it relates to a picture decoding method and apparatus for decoding compressed picture data of the first resolution and for decimating the data to moving picture data of a second resolution lower than the first resolution.
2. Description of the Related Art
There is now going on the standardization of digital television signals employing the picture compression system, such as Moving Picture Experts Group Phase 2 (MPEG2). Among the standards for digital television broadcast, there are a standard for standard resolution pictures, such as those with the number of effective lines in the vertical direction of 576, and a standard for high-resolution pictures, such as those with the number of effective lines in the vertical direction of 1152. Recently, there is raised a demand for a downdecoder for decoding compressed picture data of a high-resolution picture and for reducing the resolution of the compressed picture data by xc2xd to generate picture data of the picture data of standard resolution to display the picture data on a television monitor adapted to cope with the standard resolution.
There is proposed in a publication entitled xe2x80x9cScalable Decoder free of low-range Driftxe2x80x9d (written by Iwahashi, Kanbayashi and Takaya, Shingaku-Gihou CS94-186, DSP 94-108, 1995-01) a downdecoder for decoding a bitstream of, for example, MPEG2, obtained on predictive coding with motion prediction of a high-resolution picture and compression coding by discrete cosine transform, and for downsampling the picture to a picture of standard resolution. This Publication, referred to below as Publication 1, shows the following first to third downdecoders.
Referring to FIG. 1, this first downdecoder includes an inverse discrete cosine transform unit 1001, for processing a bitstream of a high resolution picture with 8 (number of coefficients as counted from the dc component in the horizontal direction)xc3x978 (number of coefficients as counted from the dc component in the vertical direction), an adder 1002 for adding a discrete cosine transformed high resolution picture and a motion-compensated reference picture, and a frame memory 1003 for transient storage of the reference picture. The first downdecoder also includes a motion compensation unit 1004 for motion-compensating the reference picture stored in the frame memory 1003 with xc2xd pixel precision, and a downsampling unit 1005 for converting the reference picture stored in the frame memory 1003 to a picture of standard resolution.
This first downdecoder reduces an output picture, obtained on decoding as a high resolution picture by inverse discrete cosine transform, by the downsampling unit 1005, to output resulting picture data with the standard resolution.
Referring to FIG. 2, the second downdecoder includes an inverse discrete cosine transform unit 1011 for performing 8xc3x978 inverse discrete cosine transform, as it substitutes 0 for the high-frequency components of the discrete cosine transform (DCT) block of the high resolution picture, an adder 1012 for summing the discrete cosine transformed high resolution picture to the motion-compensated reference picture, and a frame memory 1013 for transient storage of the reference picture The second downdecoder also includes a motion compensation unit 1014 for motion-compensating the reference picture stored in the frame memory 1013 with xc2xd pixel precision, and a downsampling unit 1015 for converting the reference picture stored in the frame memory 1013 to a picture of standard resolution.
This second downdecoder performs inverse discrete cosine transform to obtain a decoded output picture, as a high-resolution picture, as it substitutes 0 for coefficients of high-frequency components among the totality of coefficients of the DCT block, and reduces the output picture in size by the downsampling unit 1015 to output picture data of standard resolution.
Referring to FIG. 3, a third downdecoder includes a decimating inverse discrete cosine transform unit 102 for doing e.g., 4xc3x974 inverse discrete cosine transform, using only the coefficients of the low-frequency components of the DCT block of the bitstream of the high resolution picture, for decoding to a standard resolution picture, and an adder 1022 for summing the standard resolution picture processed with decimating inverse discrete cosine transform and the motion-compensated reference picture. The third downdecoder also includes a frame memory 1023 for transiently storing the reference picture and a motion compensation unit 1024 for motion-compensating the reference picture stored by the frame memory 1023 with a xc2xc pixel precision.
In this third downdecoder, IDCT is executed using only low-frequency components of all coefficients of the DCT block to decode a picture of low resolution from a picture of high resolution.
The above-described first downdecoder performs inverse discrete cosine transform on the totality of the coefficients in the DCT block to obtain a high-resolution picture on decoding. Thus, the inverse discrete cosine transform unit 1001 of high processing capability and the frame memory 1003 of high capacity are needed. The second downdecoder performs discrete cosine transform on the coefficients in the DCT block to obtain a high-resolution picture on decoding, as it sets the high-frequency components of the coefficients to zero, so that a lower processing capacity of the inverse discrete cosine transform unit 1011 suffices. However, the fame memory 1003 of high capacity is yet needed. In contradistinction from these first and second downdecoders, the third downdecoder performs inverse discrete cosine transform on the totality of the coefficients in the DCT block, using only coefficients of the low-frequency components of the coefficients in the DCT block, so that a low processing capability of an inverse discrete cosine transform unit 1021 suffices. Moreover, since the reference picture of the standard resolution picture is decode, a lower capacity of the frame memory 1023 suffices.
Meanwhile, the display system of a moving picture in television broadcast is classified into a sequential scanning system and an interlaced scanning system. The sequential scanning system sequentially displays a picture obtained on sampling the totality of pictures in a given frame at the same timing. The interlaced scanning system alternately displays pictures obtained on sampling pixels in a given frame at different timings from one horizontal line to another.
In this interlaced scanning system, one of the pictures obtained on sampling pixels in a frame at different timings from one horizontal line to another is termed a top field or a first field, with the other picture being termed a bottom field or a second field. The picture containing the leading line in the horizontal direction of a frame becomes the top field, while the picture containing the second line in the horizontal direction of a frame becomes the bottom field. Thus, in the interlaced scanning system, a sole frame is made up of two fields.
With the MPEG2, not only a frame but also a field can be allocated to a picture as a picture compressing unit in order to compress the moving picture signals efficiently in the interlaced scanning system.
If, in the MPEG2, a field is allocated to a picture, the resulting bitstream structure is termed a field structure, while if a frame is allocated to a picture, the resulting bitstream structure is termed a frame structure. In the field structure, a DCT block is constituted by pixels in the field and discrete cosine transform is applied on the field basis. The processing mode of performing field-based discrete cosine transform is termed the field DCT mode. In the frame structure, a DCT block is constituted by pixels in the frame and discrete cosine transform is applied on the frame basis. The processing mode of performing field-based discrete cosine transform is termed the frame DCT mode. In the field structure, a macro-block is constituted from pixels in a field and motion prediction is performed on the field basis. The processing mode of performing motion prediction on the field basis is termed the field notion prediction mode. In the frame structure, a macro-block is constituted from pixels in a frame and motion prediction is performed on the frame basis. The processing mode of performing motion prediction on the frame basis is termed the frame motion prediction mode.
Meanwhile, a picture decoding apparatus, adapted for decoding compressed picture data for the interlaced scanning system, using the third downdecoder shown in the Publication 1, is proposed in, for example, a Publication entitled in xe2x80x9cA Compensation Method of Drift Errors in Scalabilityxe2x80x9d written by N. Obikane, K. Tahara and J. Yonemitsu, HDTV Work Shop ""93. This Publication is hereinafter termed the Publication 2.
Referring to FIG. 4, the conventional picture decoding device, shown in Publication 2, includes a bitstream analyzer 1031, fed with a bitstream obtain ed on compressing a high resolution picture in accordance with the MPEG2, for analyzing this bitstream, a variable length encoding/decoding unit 1032 for variable length encoding data for allocating codes of lengths corresponding to the data occurrence frequency and for decoding the variable length encoded bitstream, and a dequantizer 1033 for multiplying the respective coefficients of the DCT block with quantization steps. The conventional picture decoding device also includes a decimating inverse discrete cosine transform unit 1034 for decoding a standard resolution picture by e.g., 4xc3x974 inverse discrete cosine transform using only coefficients of low-frequency components of the totality of the coefficients of the DCT block, and an adder 1035 for summing the standard resolution picture processed with decimating inverse discrete cosine transform to a motion-compensated reference picture. The conventional picture decoding device also includes a frame memory 1036 for transiently storing the reference picture and a motion compensation unit 1037 for motion compensating the reference picture stored in the frame memory 1036 to a xc2xc pixel precision.
The decimating inverse discrete cosine transform unit 1034 of the conventional picture decoding device, shown in the Publication 2, performs the inverse discrete cosine transform, using only the coefficients of the low-frequency components of the totality of the coefficients in the DCT block. It is noted that the positions of the coefficients of the frame DCT mode, processed with the inverse discrete cosine transform, differ from those of the field DCT mode.
Specifically, in the field DCT mode, the decimating inverse discrete cosine transform 1034 applies the inverse discrete cosine transform only on the 4xc3x974 of 8xc3x978 coefficients in the DCT block, as shown in Fig. On the other hand, in the frame DCT mode, the decimating inverse discrete cosine transform 1034 applies the in verse discrete cosine transform only on the 4xc3x972+4xc3x972 of 8xc3x978 coefficients in the DCT block, as shown in FIG. 32.
Also, the motion compensation unit 1037 of the conventional picture decoding device performs motion compensation to xc2xc pixel precision, adapted to cope with the field motion prediction mode or with the frame motion prediction mode, based on the information (motion vector) on the motion prediction performed on the high resolution picture. Specifically, while the MPEG2 usually provides that the motion compensation be performed to xc2xd pixel precision, the number of pixels in a picture is thinned out to one-half if a standard resolution picture is to be decoded from a high resolution picture. Thus, the motion compensation unit 1037 performs motion compensation as it sets the pixel precision for motion compensation to xc2xc pixel.
Therefore, the motion compensation device 1037 performs linear interpolation on the pixels of the reference picture stored in the frame memory 1036 as a standard resolution picture to generate pixels to a xc2xc pixel accuracy.
Specifically, the processing for linear interpolation of pixels in the perpendicular direction for the field motion prediction mode and that for the frame motion prediction mode are explained with reference to FIGS. 7 and 8, in which the phase of pixels in the vertical direction is indicated in the perpendicular direction ,with the phase of each pixel in a displayed picture being indicated by an integer.
Referring to FIG. 7, the processing for interpolation of a picture motion-predicted in the field motion prediction mode is explained. For a high resolution picture (upper layer), motion compensation is independently performed to a xc2xd pixel precision, from field to field, as shown in FIG. 7A. On the other hand, for a standard resolution picture (lower layer), motion compensation is achieved by generating pixels dephased by xc2xe, xc2xd and xc2xe pixel in the perpendicular direction by linear interpolation in a field based on the pixel of an integer number precision, as shown in FIG. 7B. That is, in the standard resolution picture (lower layer), pixels with xc2xc pixel precision of the top field are generated by linear interpolation based on the pixels of the integer number precision of the top field, while those with xc2xc pixel precision of the bottom field are generated by linear interpolation based on the pixels of the integer number precision of the bottom field. It is assumed for example that the value of a pixel of the top field, having the phase in the perpendicular direction at the 0-position, is a, with the value of a pixel having the phase in the perpendicular direction at the 1-position is b. In this case, the pixel of the top field with the phase in the perpendicular direction of xc2xc is (3a+b)/4, while the pixel of the top field with the phase in the perpendicular direction of xc2xd is (a+b)/2, with the pixel of the top field with the phase in the perpendicular direction of xc2xe being (a+3b)/4.
Referring to FIG. 8, the processing of interpolation of a picture motion-predicted in the frame motion prediction mode is explained. For a high resolution picture (upper layer), interpolation processing is performed across the fields, that is across the bottom field and the top field, as shown in FIG. 8A, with the motion compensation precision being xc2xd pixel precision. For a standard resolution picture (lower layer), motion compensation is achieved by generating pixels dephased by xc2xe, xc2xd and xc2xe pixels in the perpendicular direction, based on the pixels of the integer number precision of two fields, that is the top field and the bottom field, as shown in FIG. 8B. For example, it is assumed that the value of a pixel of the bottom field having the phase in the perpendicular direction of xe2x88x921 is a, the value of a pixel of the top field having the phase in the perpendicular direction of 0 is b, the value of a pixel of the bottom field having the phase in the perpendicular direction of 1 is c, the value of a pixel of the top field having the phase in the perpendicular direction of 2 is d, and a pixel of the top field having the phase in the perpendicular direction of 3 is e. In this case, the pixels of xc2xc pixel precision, having the phase in the perpendicular direction in a range from 0 and 2, may be found as follows:
The pixel having the phase in the perpendicular direction of xc2xc is (a+4b+3c)/8, while the pixel having the phase in the perpendicular direction of xc2xd is (a+3c)/4. The while the pixel having the phase in the perpendicular direction of xc2xe is (a+2b+3c+2d)/8, while the pixel having the phase in the perpendicular direction of {fraction (5/4)} is (2b+3c+2d+e)/8. The pixel having the phase in the perpendicular direction of {fraction (3/2)} is (3c+e)/4, while the pixel having the phase in the perpendicular direction of {fraction (7/4)}i s (3c+4d+e)/8.
With the above-described picture decoding device, disclosed in the Publication 2, the compressed picture data of the high resolution picture, associated with the interlaced scanning system, can be decoded to standard resolution picture.
However, with the conventional picture decoding device, shown in the above Publication 2, the pixels of the standard resolution picture obtained with the field DCT mode are dephased with respect to the pixels of the standard resolution obtained with the frame DCT mode. Specifically, with the field DCT mode, the phases of the pixels in the perpendicular direction of the respective pixels of the top field of the lower layer are xc2xd, {fraction (5/2)}, . . . , with the phases in the perpendicular direction of the respective pixels of the bottom field of the lower layer being 1, 3, . . . , as shown in FIG. 9. On the other hand, with the field DCT mode, the phases of the pixels in the perpendicular direction of the respective pixels of the top field of the lower layer are 0, 2, . . . , with the phases in the perpendicular direction of the respective pixels of the bottom field of the lower layer being 1, 3, . . . , as shown in FIG. 10. Thus, the pictures with different phases co-exist in the frame memory 1036, thus deteriorating the picture quality of the output picture.
With the conventional picture decoding device, shown in the Publication 2, correction is not made of phase deviations or dephasing of the pixels at the time of the motion compensation with the field motion prediction mode and the frame motion prediction mode resulting in the deteriorated picture quality.
It is therefore an object of the present invention to provide a picture decoding method and a picture decoding device for decoding standard resolution picture data from compressed picture data of the high resolution picture whereby the processing volume necessary for decoding and the storage capacity may be reduced and phase deviations of pixels between the field motion prediction mode and the frame motion prediction mode during motion compensation may be eliminated to prevent picture quality deterioration ascribable to motion compensation.
In one aspect, the present invention provides a picture decoding apparatus for decoding moving picture data of a second resolution from compressed picture data of a first resolution, obtained on predictive coding by motion prediction in terms of a pre-set pixel block (macro-block) as a unit and on compression coding in terms of a pre-set pixel block (orthogonal transform block) as a unit, the second resolution being lower than the first resolution. The apparatus includes inverse orthogonal transform means for inverse orthogonal transforming coefficients of low-frequency components of respective coefficients of an orthogonal transform block of the orthogonal transformed compressed data, addition means for summing compressed picture data orthogonal transform by the inverse orthogonal transform means to motion compensated reference picture data to output moving picture data of the second resolution, storage means for storing output moving picture data of the addition means as reference picture data, first motion compensation means for motion compensating a macro-block of reference picture data motion-predicted in accordance with a motion prediction system associated with interlaced scanning (field motion prediction mode) and second motion compensation means for motion compensating a macro-block of reference picture e data motion-predicted in accordance with a motion prediction system associated with sequential scanning (frame motion prediction mode). The first and second motion compensation means interpolates respective pixels of the macro-block of the reference picture data stored by the storage means to generate a macro-block constructed by pixels having xc2xc pixel precision with respect to the reference picture data stored by the storage means.
In the present picture decoding device, respective pixels of a macro-block of stored reference picture data are interpolated to generate a macro-block constituted by pixels having a xc2xc pixel precision. The present picture decoding device of puts moving picture data of a second resolution lower than a first resolution.
With the picture decoding device of the present invention, first and second motion compensation means switch the number of taps of a filter used for interpolating respective pixels of a macro-block of reference picture data stored in the storage means every pre-set unit to generate a macro-block constructed by pixels having a xc2xc pixel precision with respect to reference picture data stored in the storage means.
With the picture decoding device of the present invention, the number of filter taps is switched to interpolate the respective pixels of the macro-block of the stored reference picture data to generate a macro-block constructed by pixels having a xc2xc pixel precision.
In another aspect, the present invention provides a picture decoding method for decoding moving picture data of a second resolution from compressed picture data of a first resolution, obtained on predictive coding by motion prediction in terms of a pre-set pixel block (macro-block) as a unit and on compression coding in terms of a pre-set pixel block (orthogonal transform block) as a unit, the second resolution being lower than the first resolution. The method includes inverse orthogonal transforming coefficients of low-frequency components of respective coefficients of an orthogonal transform block of the orthogonal transformed compressed data, adding inverse orthogonal transformed compressed picture data to motion compensated reference picture data, storing moving picture data obtained on addition as reference picture data, motion compensating a macro-block of the reference picture data motion-predicted by a motion prediction system associated with interlaced scanning (field motion prediction mode), motion compensating a macro-block of the reference picture data motion-predicted by a motion prediction system associated with sequential scanning (frame motion prediction mode), interpolating respective pixels of a macro-block motion-predicted by the field motion prediction mode or frame motion prediction mode to generate a macro-block constructed by pixels having xc2xc pixel precision with respect to stored reference picture data, and motion compensating the generated macro-block.
In the present picture decoding device, respective pixels of a macro-block of stored reference picture data are interpolated to generate a macro-block constituted by pixels having a xc2xc pixel precision. The present picture decoding device outputs moving picture data of a second resolution lower than a first resolution.
In the present picture decoding device, the number of filter taps used for interpolating respective pixels of a macro-block of reference picture data, motion-predicted in accordance with the field motion prediction mode or frame motion prediction mode, is switched every pre-set unit to generate a macro-block constructed buy pixels having a xc2xc pixel precision with respect to stored reference picture data.
With the picture decoding device of the present invention, the number of filter taps is switched to interpolate the respective pixels of the macro-block of the stored reference picture data to generate a macro-block constructed by pixels having a xc2xc pixel precision.
According to the present invention, respective pixels of a macro-block of stored reference picture data are interpolated to generate a macro-block constituted by pixels having a xc2xc pixel precision. The present picture decoding device outputs moving picture data of a second resolution lower than a first resolution.
It is therefore possible with the present invention to reduce the processing volume and storage volume necessary for decoding and to eliminate pixel dephasing due to the field motion prediction mode or frame motion prediction mode at the time of motion compensation to prevent deterioration in the picture quality otherwise produced due to motion compensation.
According to the present invention, the number of filter taps is switched to interpolate respective pixels of stored reference picture data to generate a macro-block constructed by pixels of xc2xc pixel precision.
It is therefore possible with the present invention to reduce the processing volume and storage volume at the time of motion compensation without deteriorating the picture quality to expedite the processing.