1. Field of the Invention
This invention relates to a picture decoding method and apparatus for decoding compressed picture data of a first resolution obtained on predictive coding by motion prediction in terms of a pre-set pixel block (macro-block) as a unit and on performing orthogonally-transform in terms of a pre-set pixel block (orthogonal transform block) as a unit. More particularly, it relates to a picture decoding method and apparatus for decoding compressed picture data of the first resolution and for decimating the data to moving picture data of a second resolution lower than the first resolution.
2. Description of the Related Art
There is now going on the standardization of digital television signals employing the picture compression system, such as Moving Picture Experts Group Phase 2 (MPEG2). Among the standards for digital television broadcast, there are a standard for standard resolution pictures, such as those with the number of effective lines in the vertical direction of 576, and a standard for high-resolution pictures, such as those with the number of effective lines in the vertical direction of 1152. Recently, there is raised a demand for a downdecoder for decoding compressed picture data of a high-resolution picture and for reducing the resolution of the compressed picture data by xc2xd to generate picture data of the picture data of standard resolution to display the picture data on a television monitor adapted to cope with the standard resolution.
There is proposed in a publication entitled xe2x80x9cScalable Decoder free of low-range Driftxe2x80x9d (written by Iwahashi, Kanbayashi and Takaya, Shingaku-Gihou CS94-186, DSP 94-108, 1995-01) a downdecoder for decoding a bitstream of, for example, MPEG2, obtained on predictive coding with motion prediction of a high-resolution picture and compression coding by discrete cosine transform, and for downsampling the picture to a picture of standard resolution. This Publication, referred to below as Publication 1, shows the following first to third downdecoders.
Referring to FIG. 1, this first downdecoder includes an inverse discrete cosine transform unit 1001, for processing a bitstream of a high resolution picture with 8 (number of coefficients as counted from the dc component in the horizontal direction) xc3x978 (number of coefficients as counted from the dc component in the vertical direction), an adder 1002 for adding a discrete cosine transformed high resolution picture and a motion-compensated reference picture, and a frame memory 1003 for transient storage of the reference picture. The first downdecoder also includes a motion compensation unit 1004 for motion-compensating the reference picture stored in the frame memory 1003 with xc2xd pixel precision, and a downsampling unit 1005 for converting the reference picture stored in the frame memory 1003 to a picture of standard resolution.
This first downdecoder reduces an output picture, obtained on decoding as a high resolution picture by inverse discrete cosine transform, by the downsampling unit 1005, to output resulting picture data with the standard resolution.
Referring to FIG. 2, the second downdecoder includes an inverse discrete cosine transform unit 1011 for performing 8xc3x978 inverse discrete cosine transform, as it substitutes 0 for the high-frequency components of the discrete cosine transform (DCT) block of the high resolution picture, an adder 1012 for summing the discrete cosine transformed high resolution picture to the motion-compensated reference picture, and a frame memory 1013 for transient storage of the reference picture. The second downdecoder also includes a motion compensation unit 1014 for motion-compensating the reference picture stored in the frame memory 1013 with xc2xd pixel precision, and a downsampling unit 1015 for converting the reference picture stored in the frame memory 1013 to a picture of standard resolution.
This second downdecoder performs inverse discrete cosine transform to obtain a decoded output picture, as a high-resolution picture, as it substitutes 0 for coefficients of high-frequency components among the totality of coefficients of the DCT block, and reduces the output picture in size by the downsampling unit 1015 to output picture data of standard resolution.
Referring to FIG. 3, a third downdecoder includes a decimating inverse discrete cosine transform unit 102 for doing e.g., 4xc3x974 inverse discrete cosine transform, using only the coefficients of the low-frequency components of the DCT block of the bitstream of the high resolution picture, for decoding to a standard resolution picture, and an adder 1022 for suiting the standard resolution picture processed with decimating inverse discrete cosine transform and the motion-compensated reference picture. The third downdecoder also includes a frame memory 1023 for transiently storing the reference picture and a motion compensation unit 1024 for motion-compensating the reference picture stored by the frame memory 1023 with a xc2xc pixel precision.
In this third downdecoder, IDCT is executed using only low-frequency components of all coefficients of the DCT block to decode a picture of low resolution from a picture of high resolution.
The above-described first downdecoder performs inverse discrete cosine transform on the totality of the coefficients in the DCT block to obtain a high-resolution picture on decoding. Thus, the inverse discrete cosine transform unit 1001 of high processing capability and the frame memory 1003 of high capacity are needed. The second downdecoder performs discrete cosine transform on the coefficients in the DCT block to obtain a high-resolution picture on decoding, as it sets the high-frequency components of the coefficients to zero, so that a lower processing capacity of the inverse discrete cosine transform unit 1011 suffices. However, the frame memory 1003 of high capacity is yet needed. In contradistinction from these first and second downdecoders, the third downdecoder performs inverse discrete cosine transform on the totality of the coefficients in the DCT block, using only coefficients of the low-frequency components of the coefficients in the DCT block, so that a low processing capability of an inverse discrete cosine transform unit 1021 suffices. Moreover, since the reference picture of the standard resolution picture is decoded, a lower capacity of the frame memory 1023 suffices.
Meanwhile, the display system of a moving picture in television broadcast is classified into a sequential scanning system and an interlaced scanning system. The sequential scanning system sequentially displays a picture obtained on sampling the totality of pictures in a given frame at the same timing. The interlaced scanning system alternately displays pictures obtained on sampling pixels in a given frame at different timings from one horizontal line to another.
In this interlaced scanning system, one of the pictures obtained on sampling pixels in a frame at different timings from one horizontal line to another is termed a top field or a first field, with the other picture being termed a bottom field or a second field. The picture containing the leading line in the horizontal direction of a frame becomes the top field, while the picture containing the second line in the horizontal direction of a frame becomes the bottom field. Thus, in the interlaced scanning system, a sole frame is made up of two fields.
With the MPEG2, not only a frame but also a field can be allocated to a picture as a picture compressing unit in order to compress the moving picture signals efficiently in the interlaced scanning system.
If, in the MPEG2, a field is allocated to a picture, the resulting bitstream structure is termed a field structure, while if a frame is allocated to a picture, the resulting bitstream structure is termed a frame structure. In the field structure, a DCT block is constituted by pixels in the field and discrete cosine transform is applied on the field basis. The processing mode of performing field-based discrete cosine transform is termed the field DCT mode. In the frame structure, a DCT block is constituted by pixels in the frame and discrete cosine transform is applied on the frame basis. The processing mode of performing field-based discrete cosine transform is termed the frame DCT mode. In the field structure, a macro-block is constituted from pixels in a field and motion prediction is performed on the field basis. The processing mode of performing motion prediction on the field basis is termed the field motion prediction mode. In the frame structure, a macro-block is constituted from pixels in a frame and motion prediction is performed on the frame basis. The processing mode of performing motion prediction on the frame basis is termed the frame motion prediction mode.
Meanwhile, a picture decoding apparatus, adapted for decoding compressed picture data for the interlaced scanning system, using the third downdecoder shown in the Publication 1, is proposed in, for example, a Publication entitled in xe2x80x9cA Compensation Method of Drift Errors in Scalabilityxe2x80x9d written by N. Obikane, K. Tahara and J. Yonemitsu, HDTV Work Shop ""93. This Publication is hereinafter termed the Publication 2.
Referring to FIG. 4, the conventional picture decoding device, shown in Publication 2, includes a bitstream analyzer 1031, fed with a bitstream obtained on compressing a high resolution picture in accordance with the MPEG2, for analyzing this bitstream, a variable length encoding/decoding unit 1032 for variable length encoding data for allocating codes of lengths corresponding to the data occurrence frequency and for decoding the variable length encoded bitstream, and a dequantizer 1033 for multiplying the respective coefficients of the DCT block with quantization steps. The conventional picture decoding device also includes a decimating inverse discrete cosine transform unit 1034 for decoding a standard resolution picture by e.g., 4xc3x974 inverse discrete cosine transform using only coefficients of low-frequency components of the totality of the coefficients of the DCT block, and an adder 1035 for summing the standard resolution picture processed with decimating inverse discrete cosine transform to a motion-compensated reference picture. The conventional picture decoding device also includes a frame memory 1036 for transiently storing the reference picture and a motion compensation unit 1037 for motion compensating the reference picture stored in the frame memory 1036 to a xc2xc pixel precision.
The decimating inverse discrete cosine transform unit 1034 of the conventional picture decoding device, shown in the Publication 2, performs the inverse discrete cosine transform, using only the coefficients of the low-frequency components of the totality of the coefficients in the DCT block. It is noted that the positions of the coefficients of the frame DCT mode, processed with the inverse discrete cosine transform, differ from those of the field DCT mode.
Specifically, in the field DCT mode, the decimating inverse discrete cosine transform 1034 applies the inverse discrete cosine transform only on the 4xc3x974 of 8xc3x978 coefficients in the DCT block, as shown in FIG. 5. On the other hand, in the frame DCT mode, the decimating inverse discrete cosine transform 1034 applies the inverse discrete cosine transform only on the 4xc3x972+4xc3x972 of 8xc3x978 coefficients in the DCT block, as shown in FIG. 6.
Also, the motion compensation unit 1037 of the conventional picture decoding device performs motion compensation to xc2xc pixel precision, adapted to cope with the field motion prediction mode or with the frame motion prediction mode, based on the information (motion vector) on the motion prediction performed on the high resolution picture. Specifically, while the MPEG2 usually provides that the motion compensation be preformed to xc2xd pixel precision, the number of pixels in a picture is thinned out to one-half if a standard resolution picture is to be decoded from a high resolution picture. Thus, the motion compensation unit 1037 performs motion compensation as it sets the pixel precision for motion compensation to xc2xc pixel.
Therefore, the motion compensation device 1037 performs linear interpolation on the pixels of the reference picture stored in the frame memory 1036 as a standard resolution picture to generate pixels to a xc2xc pixel accuracy.
Specifically, the processing for linear interpolation of pixels in the perpendicular direction for the field motion prediction mode and that for the frame motion prediction mode are explained with reference to FIGS. 7 and 8, in which the phase of pixels in the vertical direction is indicated in the perpendicular direction, with the phase of each pixel in a displayed picture being indicated by an integer.
Referring to FIG. 7, the processing for interpolation of a picture motion-predicted in the field motion prediction mode is explained. For a high resolution picture (upper layer), motion compensation is independently preformed to a xc2xd pixel precision, from field to field, as shown in FIG. 7A. On the other hand, for a standard resolution picture (lower layer), motion compensation is achieved by generating pixels dephased by xc2xc, xc2xd and xc2xe pixel in the perpendicular direction by linear interpolation in a field based on the pixel of an integer number precision, as shown in FIG. 7B. That is, in the standard resolution picture (lower layer), pixels with xc2xc pixel precision of the top field are generated by linear interpolation based on the pixels of the integer number precision of the top field, while those with xc2xc pixel precision of the bottom field are generated by linear interpolation based on the pixels of the integer number precision of the bottom field. It is assumed for example that the value of a pixel of the top field, having the phase in the perpendicular direction at the 0-position, is a, with the value of a pixel having the phase in the perpendicular direction at the 1-position is b. In this case, the pixel of the top field with the phase in the perpendicular direction of xc2xc is (3a+b)/4, while the pixel of the top field with the phase in the perpendicular direction of xc2xd is (a+b)/2, with the pixel of the top field with the phase in the perpendicular direction of xc2xe being (a+3b)/4.
Referring to FIG. 8, the processing of interpolation of a picture motion-predicted in the frame motion prediction mode is explained. For a high resolution picture (upper layer), interpolation processing is performed across the fields, that is across the bottom field and the top field, as shown in FIG. 8A, with the motion compensation precision being xc2xd pixel precision. For a standard resolution picture (lower layer), motion compensation is achieved by generating pixels dephased by xc2xc, xc2xd and xc2xe pixels in the perpendicular direction, based on the pixels of the integer number precision of two fields, that is the top field and the bottom field, as shown in FIG. 8B. For example, it is assumed that the value of a pixel of the bottom field having the phase in the perpendicular direction of xe2x88x921 is a, the value of a pixel of the top field having the phase in the perpendicular direction of 0 is b, the value of a pixel of the bottom field having the phase in the perpendicular direction of 1 is c, the value of a pixel of the top field having the phase in the perpendicular direction of 2 is d, and a pixel of the top field having the phase in the perpendicular direction of 3 is e. In this case, the pixels of xc2xc pixel precision, having the phase in the perpendicular direction in a range from 0 and 2, may be found as follows:
The pixel having the phase in the perpendicular direction of xc2xc is (a+4b+3c)/8, while the pixel having the phase in the perpendicular direction of xc2xd is (a+3c)/4. The pixel having the phase in the perpendicular direction of xc2xe is (a+2b+3c+2d)/8, while the pixel having the phase in the perpendicular direction of {fraction (5/4)} is (2b+3c+2d+e )/8. The pixel having the phase in the perpendicular direction of {fraction (3/2)} is (3c+e)/4, while the pixel having the phase in the perpendicular direction of {fraction (7/4)} is (3c+4d+e)/8.
With the above-described picture decoding device, disclosed in the Publication 2, the compressed picture data of the high resolution picture, associated with the interlaced scanning system, can be decoded to standard resolution picture.
However, with the conventional picture decoding device, shown in the above Publication 2, the pixels of the standard resolution picture obtained with the field DCT mode are dephased with respect to the pixels of the standard resolution obtained with the frame DCT mode. Specifically, with the field DCT mode, the phases of the pixels in the perpendicular direction of the respective pixels of the top field of the lower layer are xc2xd, {fraction (5/2)}, . . . , with the phases in the perpendicular direction of the respective pixels of the bottom field of the lower layer being 1, 3, . . . , as shown in FIG. 9. On the other hand, with the field DCT mode, the phases of the pixels in the perpendicular direction of the respective pixels of the top field of the lower layer are 0, 2, . . . , with the phases in the perpendicular direction of the respective pixels of the bottom field of the lower layer being 1, 3, . . . , as shown in FIG. 10. Thus, the pictures with different phases co-exist in the frame memory 1036, thus deteriorating the picture quality of the output picture.
With the conventional picture decoding device, shown in the Publication 2, correction is not made of phase deviations or dephasing of the pixels at the time of the motion compensation with the field motion prediction mode and the frame motion prediction mode resulting in the deteriorated picture quality.
It is therefore an object of the present invention to provide a picture decoding method and a picture decoding device for decoding standard resolution picture data from compressed picture data of the high resolution picture whereby phase deviations of pixels of output moving picture data may be eliminated without detracting from characteristics proper to a picture obtained on interlaced scanning.
In one aspect, the present invention provides a picture decoding apparatus for decoding moving picture data of a second resolution from compressed picture data of a first resolution, obtained on predictive coding by motion prediction in terms of a pre-set pixel block (macro-block) as a unit and on compression coding in terms of a pre-set pixel block (orthogonal transform block) as a unit, the second resolution being lower than the first resolution. The picture decoding apparatus includes first inverse orthogonal transform means for inverse orthogonal transforming an orthogonal transform block of the compressed picture data, orthogonal transformed by an orthogonal transform system (field orthogonal transform mode) associated with the interlaced scanning, second inverse orthogonal transform means for inverse orthogonal transforming an orthogonal transform block of the compressed picture data, orthogonal transformed in accordance with an orthogonal transform system (frame orthogonal transform mode) associated with the sequential scanning, addition means for summing the compressed picture data, inverse orthogonal transformed by the first inverse orthogonal transform means or the second inverse orthogonal transform means, to motion compensated reference picture data to output moving picture data of the second resolution, memory means for storing moving picture data outputted by the addition means as reference picture data, and motion compensation means for motion compensating the macro-block of the reference picture data stored in the memory means. The first inverse orthogonal transform means inverse orthogonal transforms coefficients of low-frequency components of respective coefficients of the orthogonal transform block and corrects the phase of a xc2xc pixel for the vertical direction of respective pixels of the top field obtained on inverse orthogonal transform, the first inverse orthogonal transform means correcting the phase of a xc2xe pixel for the vertical direction of respective pixels of the bottom field obtained on inverse orthogonal transform. The second inverse orthogonal transform means inverse orthogonal transforms the coefficients of the totality of the frequency components of the orthogonal transform block, separates the inverse orthogonal transformed block into two pixel blocks associated with the interlaced scanning, inverse orthogonal transforms low-frequency components of the coefficients of the two orthogonal transformed pixel blocks, corrects the phase of xc2xc pixel for the vertical direction of respective pixels of the top field obtained on inverse orthogonal transform, corrects the phase of xc2xe pixel for the vertical direction of respective pixels of the bottom field obtained on inverse orthogonal transform, and synthesizes the phase-corrected top and bottom fields.
In this picture decoding device, the coefficients of the totality of the frequency components of the orthogonal transform block are inverse orthogonal transformed, the inverse orthogonal transformed block are separated into two pixel blocks associated with the interlaced scanning, low-frequency components of the coefficients of the two orthogonal transformed pixel blocks are inverse orthogonal transformed, the phase of xc2xc pixel for the vertical direction of respective pixels of the top field obtained on inverse orthogonal transform is corrected, the phase of xc2xe pixel for the vertical direction of respective pixels of the bottom field obtained on inverse orthogonal transform is corrected, and the phase-corrected top and bottom fields are synthesized.
With the picture decoding device of the present invention, the motion compensation means includes first motion compensation means for motion compensating a macro-block of reference picture data motion-predicted in accordance with a motion prediction system associated with interlaced scanning (field motion prediction mode), and second motion compensation means for motion compensating a macro-block of reference picture data motion-predicted in accordance with a motion prediction system associated with sequential scanning (frame motion prediction mode). The first motion compensation means and the second motion compensation means interpolate respective pixels of the macro-block of reference picture data stored in the memory means to generate a macro-block constituted by pixels of the xc2xc pixel precision for the reference picture data stored in the memory means to execute motion compensation on the generated macro-block.
The present picture decoding device interpolates the pixels of the macro-block od stored reference picture data to generate a macro-block constructed by pixels of the xc2xc pixel precision.
In the present picture decoding device, the first and second motion compensation means switch the number of taps of a filter used for interpolating respective pixels of a macro-block of reference picture data stored in the memory means, every pre-set unit, to generate a macro-block constructed by pixels of xc2xc pixel precision for reference picture data stored in the memory means.
In the present picture decoding device, the number of tapes of a filter is switched to interpolate respective pixels of a macro-block of stored reference picture data to generate a macro-block constructed from pixels of xc2xc pixel precision.
In the present picture decoding device, the second motion compensation means interpolate respective pixels of the macro-block of reference picture data stored in the memory means between top and bottom fields to generate a macro-block constructed by pixels of xc2xc pixel precision for reference picture data stored in the memory means.
In the present picture decoding device, respective pixels of the macro-block of reference picture data orthogonal transformed by the frame orthogonal transform mode are interpolated between top and bottom fields to generate a macro-block constructed by pixels of xc2xc pixel precision.
In another aspect, the present invention provides a picture decoding method for decoding moving picture data of a second resolution from compressed picture data of a first resolution, obtained on predictive coding by motion prediction in terms of a pre-set pixel block (macro-block) as a unit and on compression coding in terms of a pre-set pixel block (orthogonal transform block) as a unit, the second resolution being lower than the first resolution. The picture decoding method includes inverse orthogonal transforming an orthogonal transform block of the compressed picture data, orthogonal transformed in accordance with an orthogonal transform system (field orthogonal transform mode) associated with the interlaced scanning, inverse orthogonal transforming an orthogonal transform block of the compressed picture data, orthogonal transformed in accordance with an orthogonal transform system (frame orthogonal transform mode) associated with the sequential scanning, summing the inverse orthogonal transformed compressed picture data to motion compensated reference picture data, storing moving picture data, obtained on summation, as reference picture data, motion compensating a macro-block of stored reference picture data, inverse orthogonal transforming coefficients of low-frequency components of respective coefficients of an orthogonal transform block orthogonal transformed in accordance with the field orthogonal transform mode, correcting the phase of a xc2xc pixel for the vertical direction of respective pixels of the top field obtained on orthogonal transform, correcting the phase of a xc2xe pixel for the vertical direction of respective pixels of the bottom field obtained on inverse orthogonal transform, inverse orthogonal transforming coefficients of the totality of frequency components of an orthogonal transform block in accordance with a frame orthogonal transform mode, separating the inverse orthogonal transformed orthogonal transform block into two pixel blocks associated with interlaced scanning, orthogonal transforming the separated two pixel blocks, inverse orthogonal transforming coefficients of the low-frequency components of respective coefficients of the two orthogonal transformed blocks, correcting the phase of a xc2xc pixel for the vertical direction of respective pixels of the top field obtained on inverse orthogonal transform, correcting the phase of a xc2xe pixel for the vertical direction of respective pixels of the bottom field obtained on inverse orthogonal transform, and synthesizing the phase-corrected top and bottom fields.
According to the present invention, an orthogonal transform block of the compressed picture data, orthogonal transformed in accordance with an orthogonal transform system (field orthogonal transform mode) associated with the interlaced scanning, is inverse orthogonal transformed, an orthogonal transform block of the compressed picture data, orthogonal transformed in accordance with an orthogonal transform system (frame orthogonal transform mode) associated with the sequential scanning, is inverse orthogonal transformed, and the inverse orthogonal transformed compressed picture data is summed to motion compensated reference picture data. The moving picture data, obtained on summation, are stored as reference picture data, a macro-block of stored reference picture data is motion compensated and coefficients of low-frequency components of respective coefficients of an orthogonal transform block orthogonal transformed in accordance with the field orthogonal transform mode are inverse orthogonal transformed. The phase of a xc2xc pixel for the vertical direction of respective pixels of the top field obtained on orthogonal transform is corrected, and the phase of a xc2xe pixel for the vertical direction of respective pixels of the bottom field obtained on inverse orthogonal transform is corrected. The coefficients of the totality of frequency components of an orthogonal transform block are inverse orthogonal transformed in accordance with a frame orthogonal transform mode. The inverse orthogonal transformed orthogonal transform block is separated into two pixel blocks associated with interlaced scanning. The separated two pixel blocks are respectively orthogonal transformed, and coefficients of the low-frequency components of respective coefficients of the two orthogonal transformed blocks are inverse orthogonal transformed, and the phase of a xc2xc pixel for the vertical direction of respective pixels of the top field obtained on inverse orthogonal transform is corrected, while the phase of a xc2xe pixel for the vertical direction of respective pixels of the bottom field obtained on inverse orthogonal transform is also corrected, and synthesizing the phase-corrected top and bottom fields. The present picture decoding device outputs moving picture data of a second resolution lower than a first resolution.
Thus, in accordance with the present invention, the processing volume necessary for decoding can be reduced, while dephasing of pixels of output moving picture data of the second resolution can be eliminated without impairing the characteristics proper to the interlaced picture. That is, output moving picture data can be displayed without filtering. In addition, the moving picture data of the second resolution can be improved in quality.
In the present picture decoding device, coefficients of the totality of frequency components of two orthogonal transform blocks in accordance with a frame orthogonal transform mode are inverse orthogonal transformed, and the inverse orthogonal transformed orthogonal transform blocks are separated into two pixel blocks associated with interlaced scanning. The separated two pixel blocks are orthogonal transformed, and the coefficients of the low-frequency components of respective coefficients of the two orthogonal transformed blocks are inverse orthogonal transformed. The phase of a xc2xc pixel for the vertical direction of respective pixels of the top field obtained on inverse orthogonal transform is corrected, while the phase of a xc2xe pixel for the vertical direction of respective pixels of the bottom field obtained on inverse orthogonal transform is also corrected, and the phase-corrected top and bottom fields are synthesized.
Thus, in accordance with the present invention, the processing volume necessary for decoding can be reduced, while dephasing of pixels of output moving picture data of the second resolution can be eliminated without impairing picture characteristics proper to the interlaced picture. Specifically, output moving picture data can be displayed without filtering. In addition, the moving picture data of the second resolution can be improved in quality.
In the present invention, pixels of a macro-block of stored reference picture data are interpolated to generate a macro-block constructed by pixels of xc2xc pixel precision.
Thus, according to the present invention, dephasing of pixels between the field motion prediction mode and the frame motion prediction mode is eliminated to prevent deterioration of the picture quality ascribable to motion compensation.
According to the present invention, the number of filter taps can be switched to interpolate the pixels of the macro-block of the stored reference picture data to generate a macro-block constructed by pixels of xc2xc pixel precision.
Thus, according to the present invention, the processing volume at the time of motion compensation can be reduced without deteriorating the picture quality to expedite the processing.
According to the present invention, the respective pixels of the macro-block of reference picture data orthogonal transformed by the frame orthogonal transform mode can be interpolated between the top and bottom fields to generate a macro-block constructed by pixels of the xc2xc pixel precision.
In this manner, it is possible with the present invention to prevent deterioration of the picture quality ascribable to motion compensation.