1. Field of the Invention
This invention relates to a picture decoding method and apparatus for decoding compressed picture data of a first resolution, obtained on predictive coding by motion prediction in terms of a pre-set pixel block (macro-block) as a unit, and on orthogonal transforming in terms of a pre-set pixel block (orthogonal transform block) as a unit. More particularly, it relates to a picture decoding method and apparatus for decoding compressed picture data of the first resolution and for decimating the data to moving picture data of a second resolution lower than the first resolution.
2. Description of the Related Art
There is now going on the standardization of digital television signals employing the picture compression system, such as Moving Picture Experts Group Phase 2 (MPEG2). Among the standards for digital television broadcast, there are a standard for standard resolution pictures, such as those with the number of effective lines in the vertical direction of 576, and a standard for high-resolution pictures, such as those with the number of effective lines in the vertical direction of 1152. Recently, there is raised a demand for a downdecoder for decoding compressed picture data of a high-resolution picture and for reducing the resolution of the compressed picture data by xc2xd to generate picture data of the picture data of standard resolution to display the picture data on a television monitor designed to cope with the standard resolution.
There is proposed in a publication entitled xe2x80x9cScalable Decoder free of Low-Range Driftxe2x80x9d (written by Iwahashi, Kanbayashi and Takaya, Shingaku-Gihou CS94-186, DSP 94-108, 1995-01) a downdecoder for decoding a bitstream of, for example, MPEG2, obtained on predictive coding with motion prediction of a high-resolution picture and compression coding by discrete cosine transform, and for downsampling the picture to a picture of standard resolution. This Publication, referred to below as Publication 1, shows the following first to third downdecoders.
Referring to FIG. 1, this first downdecoder includes an inverse discrete cosine transform unit 1001, for processing a bitstream of a high resolution picture with 8 (number of coefficients as counted from the dc component in the horizontal direction)xc3x978 (number of coefficients as counted from the dc component in the vertical direction), an adder 1002 for adding a discrete cosine transformed high resolution picture and a motion-compensated reference picture, and a frame memory 1003 for transient storage of the reference picture. The first downdecoder also includes a motion compensation unit 1004 for motion-compensating the reference picture stored in the frame memory 1003 with xc2xd pixel precision, and a downsampling unit 1005 for converting the reference picture stored in the frame memory 1003 to a picture of standard resolution.
This first downdecoder reduces an output picture, obtained on decoding as a high resolution picture by inverse discrete cosine transform, by the downsampling unit 1005, to output resulting picture data with the standard resolution.
Referring to FIG. 2, the second downdecoder includes an inverse discrete cosine transform unit 101 for performing 8xc3x978 inverse discrete cosine transform, as it substitutes 0 for the high-frequency components of the discrete cosine transform (DCT) block of the high resolution picture, an adder 1012 for summing the discrete cosine transformed high resolution picture to the motion-compensated reference picture, and a frame memory 1013 for transient storage of the reference picture. The second downdecoder also includes a motion compensation unit 1014 for motion-compensating the reference picture stored in the frame memory 1013 with xc2xd pixel precision, and a downsampling unit 1015 for converting the reference picture stored in the frame memory 1013 to a picture of standard resolution.
This second downdecoder performs inverse discrete cosine transform to obtain a decoded output picture, as a high-resolution picture, as it substitutes 0 for coefficients of high-frequency components among the totality of coefficients of the DCT block, and reduces the output picture in size by the downsampling unit 1015 to output picture data of standard resolution.
Referring to FIG. 3, a third downdecoder includes a decimating inverse discrete cosine transform unit 102 for doing e.g., 4xc3x974 inverse discrete cosine transform, using only the coefficients of the low-frequency components of the DCT block of the bitstream of the high resolution picture, for decoding to a standard resolution picture, and an adder 1022 for summing the standard resolution picture processed with decimating inverse discrete cosine transform and the motion-compensated reference picture. The third downdecoder also includes a frame memory 1023 for transiently storing the reference picture and a motion compensation unit 1024 for motion-compensating the reference picture stored by the frame memory 1023 with a xc2xc pixel precision.
In this third downdecoder, IDCT is executed using only low-frequency components of all coefficients of the DCT block to decode a picture of low resolution from a picture of high resolution.
The above-described first downdecoder performs inverse discrete cosine transform on the totality of the coefficients in the DCT block to obtain a high-resolution picture on decoding. Thus, the inverse discrete cosine transform unit 1001 of high processing capability and the frame memory 1003 of high capacity are needed. The second downdecoder performs discrete cosine transform on the coefficients in the DCT block to obtain a high-resolution picture on decoding, as it sets the high-frequency components of the coefficients to zero, so that a lower processing capacity of the inverse discrete cosine transform unit 1011 suffices. However, the frame memory 1003 of high capacity is yet needed. In contradistinction from these first and second downdecoders, the third downdecoder performs inverse discrete cosine transform on the totality of the coefficients in the DCT block, using only coefficients of the low-frequency components of the coefficients in the DCT block, so that a low processing capability of an inverse discrete cosine transform unit 1021 suffices. Moreover, since the reference picture of the standard resolution picture is decoded, a lower capacity of the frame memory 1023 suffices.
Meanwhile, the display system of a moving picture in television broadcast is classified into a sequential scanning system and an interlaced scanning system. The sequential scanning system sequentially displays a picture obtained on sampling the totality of pictures in a given frame at the same timing. The interlaced scanning system alternately displays pictures obtained on sampling pixels in a given frame at different timings from one horizontal line to another.
In this interlaced scanning system, one of the pictures obtained on sampling pixels in a frame at different timings from one horizontal line to another is termed a top field or a first field, with the other picture being termed a bottom field or a second field. The picture containing the leading line in the horizontal direction of a frame becomes the top field, while the picture containing the second line in the horizontal direction of a frame becomes the bottom field. Thus, in the interlaced scanning system, a sole frame is made up of two fields.
With the MPEG2, not only a frame but also a field can be allocated to a picture as a picture compressing unit in order to compress the moving picture signals efficiently in the interlaced scanning system.
If, in the MPEG2, a field is allocated to a picture, the resulting bitstream structure is termed a field structure, while if a frame is allocated to a picture, the resulting bitstream structure is termed a frame structure. In the field structure, a DCT block is constituted by pixels in the field and discrete cosine transform is applied on the field basis. The processing mode of performing field-based discrete cosine transform is termed the field DCT mode. In the frame structure, a DCT block is constituted by pixels in the frame and discrete cosine transform is applied on the frame basis. The processing mode of performing field-based discrete cosine transform is termed the frame DCT mode. In the field structure, a macro-block is constituted from pixels in a field and motion prediction is performed on the field basis. The processing mode of performing motion prediction on the field basis is termed the field motion prediction mode. In the frame structure, a macro-block is constituted from pixels in a frame and motion prediction is performed on the frame basis. The processing mode of performing motion prediction on the frame basis is termed the frame motion prediction mode.
Meanwhile, a picture decoding apparatus, adapted for decoding compressed picture data for the interlaced scanning system, using the third downdecoder shown in the Publication 1, is proposed in, for example, a Publication entitled in xe2x80x9cA Compensation Method of Drift Errors in Scalabilityxe2x80x9d written by N. Obikane, K. Tahara and J. Yonemitsu, HDTV Work Shop ""93. This Publication is hereinafter termed the Publication 2.
Referring to FIG. 4, the conventional picture decoding device, shown in Publication 2, includes a bitstream analyzer 1031, fed with a bitstream obtained on compressing a high resolution picture in accordance with the MPEG2, for analyzing this bitstream, a variable length encoding/decoding unit 1032 for variable length encoding data for allocating codes of lengths corresponding to the data occurrence frequency and for decoding the variable length encoded bitstream, and a dequantizer 1033 for multiplying the respective coefficients of the DCT block with quantization steps. The conventional picture decoding device also includes a decimating inverse discrete cosine transform unit 1034 for decoding a standard resolution picture by e.g., 4xc3x974 inverse discrete cosine transform using only coefficients of low-frequency components of the totality of the coefficients of the DCT block, and an adder 1035 for summing the standard resolution picture processed with decimating inverse discrete cosine transform to a motion-compensated reference picture. The conventional picture decoding device also includes a frame memory 1036 for transiently storing the reference picture and a motion compensation unit 1037 for motion compensating the reference picture stored in the frame memory 1036 to a xc2xc pixel precision.
The decimating inverse discrete cosine transform unit 1034 of the conventional picture decoding device, shown in the Publication 2, performs the inverse discrete cosine transform, using only the coefficients of the low-frequency components of the totality of the coefficients in the DCT block. It is noted that the positions of the coefficients of the frame DCT mode, processed with the inverse discrete cosine transform, differ from those of the field DCT mode.
Specifically, in the field DCT mode, the decimating inverse discrete cosine transform 1034 applies the inverse discrete cosine transform only on the 4xc3x974 of 8xc3x978 coefficients in the DCT block, as shown in FIG. 5. On the other hand, in the frame DCT mode, the decimating inverse discrete cosine transform 1034 applies the inverse discrete cosine transform only on the 4xc3x972+4xc3x972 of 8xc3x978 coefficients in the DCT block, as shown in FIG. 6.
Also, the motion compensation unit 1037 of the conventional picture decoding device performs motion compensation to xc2xc pixel precision, adapted to cope with the field motion prediction mode or with the frame motion prediction mode, based on the information (motion vector) on the motion prediction performed on the high resolution picture. Specifically, while the MPEG2 usually provides that the motion compensation be performed to xc2xd pixel precision, the number of pixels in a picture is thinned out to one-half if a standard resolution picture is to be decoded from a high resolution picture. Thus, the motion compensation unit 1037 performs motion compensation as it sets the pixel precision for motion compensation to xc2xc pixel.
Therefore, the motion compensation device 1037 performs linear interpolation on the pixels of the reference picture stored in the frame memory 1036 as a standard resolution picture to generate pixels to a xc2xc pixel accuracy.
Specifically, the processing for linear interpolation of pixels in the perpendicular direction for the field motion prediction mode and that for the frame motion prediction mode are explained with reference to FIGS. 7A, 7B, 8A and 8B, in which the phase of pixels in the vertical direction is indicated in the perpendicular direction, with the phase of each pixel in a displayed picture being indicated by an integer.
Referring to FIGS. 7A, 7B, the processing for interpolation of a picture motion-predicted in the field motion prediction mode is explained. For a high resolution picture (upper layer), motion compensation is independently performed to a xc2xd pixel precision, from field to field, as shown in FIG. 7A. On the other hand, for a standard resolution picture (lower layer), motion compensation is achieved by generating pixels dephased by xc2xc, xc2xd and xc2xe pixel in the perpendicular direction by linear interpolation in a field based on the pixel of an integer number precision, as shown in FIG. 7B. That is, in the standard resolution picture (lower layer), pixels with xc2xc pixel precision of the top field are generated by linear interpolation based on the pixels of the integer number precision of the top field, while those with xc2xc pixel precision of the bottom field are generated by linear interpolation based on the pixels of the integer number precision of the bottom field. It is assumed for example that the value of a pixel of the top field, having the phase in the perpendicular direction at the 0-position, is a, with the value of a pixel having the phase in the perpendicular direction at the 1-position is b. In this case, the pixel of the top field with the phase in the perpendicular direction of xc2xc is (3a+b)/4, while the pixel of the top field with the phase in the perpendicular direction of xc2xd is (a+b)/2, with the pixel of the top field with the phase in the perpendicular direction of xc2xe being (a+3b)/4.
Referring to FIG. 8, the processing of interpolation of a picture motion-predicted in the frame motion prediction mode is explained. For a high resolution picture (upper layer), interpolation processing is performed across the fields, that is across the bottom field and the top field, as shown in FIG. 8A, with the motion compensation precision being xc2xd pixel precision. For a standard resolution picture (lower layer), motion compensation is achieved by generating pixels dephased by xc2xc, xc2xd and xc2xe pixels in the perpendicular direction, based on the pixels of the integer number precision of two fields, that is the top field and the bottom field, as shown in FIG. 8B. For example, it is assumed that the value of a pixel of the bottom field having the phase in the perpendicular direction of xe2x88x921 is a, the value of a pixel of the top field having the phase in the perpendicular direction of 0 is b, the value of a pixel of the bottom field having the phase in the perpendicular direction of 1 is c, the value of a pixel of the top field having the phase in the perpendicular direction of 2 is d, and a pixel of the top field having the phase in the perpendicular direction of 3 is e. In this case, the pixels of xc2xc pixel precision, having the phase in the perpendicular direction in a range from 0 and 2, may be found as follows:
The pixel having the phase in the perpendicular direction of xc2xc is (a+4b+3c)/8. while the pixel having the phase in the perpendicular direction of xc2xd is (a+3c)/4. The pixel having the phase in the perpendicular direction of xc2xe is (a+2b+3c+2d)/8, while the pixel having the phase in the perpendicular direction of 5/4 is (2b+3c+2d+e )/8. The pixel having the phase in the perpendicular direction of 3/2 is (3c+e)/4, while the pixel having the phase in the perpendicular direction of 7/4 is (3c+4d+e)/8.
With the above-described picture decoding device, disclosed in the Publication 2, the compressed picture data of the high resolution picture, associated with the interlaced scanning system, can be decoded to standard resolution picture.
However, with the conventional picture decoding device, shown in the above Publication 2, the pixels of the standard resolution picture obtained with the field DCT mode are dephased with respect to the pixels of the standard resolution obtained with the frame DCT mode. Specifically, with the field DCT mode, the phases of the pixels in the perpendicular direction of the respective pixels of the top field of the lower layer are xc2xd, 5/2, . . . , with the phases in the perpendicular direction of the respective pixels of the bottom field of the lower layer being 1, 3, . . . , as shown in FIGS. 9A, 9B. On the other hand, with the field DCT mode, the phases of the pixels in the perpendicular direction of the respective pixels of the top field of the lower layer are 0, 2, . . . , with the phases in the perpendicular direction of the respective pixels of the bottom field of the lower layer being 1, 3, . . . , as shown in FIGS. 10A, 10B. Thus, pictures with different phases co-exist in the frame memory 1036, thus deteriorating the picture quality of the output picture.
With the conventional picture decoding device, shown in the Publication 2, correction is not made of phase deviations or dephasing of the pixels at the time of the motion compensation with the field motion prediction mode and the frame motion prediction mode resulting in the deteriorated picture quality.
It is therefore an object of the present invention to provide a picture decoding method and apparatus for decoding standard resolution picture data from compressed picture data of the high resolution picture with a reduced processing volume, whereby phase deviation of pixels by the field orthogonal transform mode and frame orthogonal transform mode may be eliminated without detracting from properties inherent in a picture obtained on interlaced scanning.
In one aspect, the present invention provides a picture decoding apparatus for decoding, from compressed picture data of a first resolution, moving picture data of a second resolution lower than the first resolution, the compressed picture data of the first resolution having been obtained by predictive coding by performing motion prediction in terms of a pre-set pixel block (macro-block) as a unit and orthogonal transform in terms of a pre-set pixel block (macro-block) as a unit, wherein the apparatus includes variable length decoding means for variable length decoding compressed picture data of the first resolution, obtained on back-scanning respective coefficients in the orthogonal transform block in accordance with a pre-set scanning system, and inverse orthogonal transform means for inverse orthogonal transforming the coefficients of low frequency components of coefficients of the orthogonal transform blocks of variable length decoded compressed picture data. The variable length decoding means variable length decodes coefficients of up to the highest frequency components of the coefficients of the low frequency components, to be inverse orthogonal transformed by the inverse orthogonal transform means, with the variable length decoding means not variable length decoding the coefficients of higher frequency components.
In another aspect, the present invention provides a picture decoding apparatus for decoding, from compressed picture data of a first resolution, moving picture data of a second resolution lower than the first resolution, the compressed picture data of the first resolution having been obtained by predictive coding by performing motion prediction in terms of a pre-set pixel block (macro-block) as a unit and orthogonal transform in terms of a pre-set pixel block (macro-block) as a unit, wherein the apparatus includes variable length decoding means for variable length decoding compressed picture data of the first resolution, obtained on back-scanning respective coefficients in the orthogonal transform block in accordance with a pre-set scanning system, and inverse orthogonal transform means for inverse orthogonal transforming the coefficients of low frequency components of coefficients of the orthogonal transform blocks of variable length decoded compressed picture data. The variable length decoding means variable length decodes coefficients of up to the highest frequency components of the coefficients of the low frequency components, to be inverse orthogonal transformed by the inverse orthogonal transform means, with the variable length decoding means not variable length decoding the coefficients of higher frequency components and setting to zero the coefficients, which have not been variable length decoded and which are to be inverse orthogonal transformed by the inverse orthogonal transform means.
With the present picture decoding apparatus, up to the highest frequency components of the coefficients of the low frequency components, to be inverse orthogonal transformed the coefficients of higher frequency components, are variable length decoded, while the coefficients, which have not been variable length decoded and which are to be inverse orthogonal transformed, are set to zero.
In still another aspect, the present invention provides a picture decoding method for decoding, from compressed picture data of a first resolution, moving picture data of a second resolution lower than the first resolution, the compressed picture data of the first resolution having been obtained by predictive coding by performing motion prediction in terms of a pre-set pixel block (macro-block) as a unit and orthogonal transform in terms of a pre-set pixel block (macro-block) as a unit, wherein the method includes variable length decoding compressed picture data of the first resolution, obtained on back-scanning respective coefficients in the orthogonal transform block in accordance with a pre-set scanning system, and inverse orthogonal transforming the coefficients of low frequency components of coefficients of the orthogonal transform blocks of variable length decoded compressed picture data, and wherein, up to the highest frequency components of the coefficients of the low frequency components, to be inverse orthogonal transformed by the inverse orthogonal transform means, are variable length decoded, with the coefficients of higher frequency components not being decoded.
With the present picture decoding method, up to the highest frequency components of the coefficients of the low frequency components, to be inverse orthogonal transformed by the inverse orthogonal transform means, are variable length decoded, with the coefficients of higher frequency components not being decoded.
In yet another aspect, the present invention provides a picture decoding method for decoding, from compressed picture data of a first resolution, moving picture data of a second resolution lower than the first resolution, the compressed picture data of the first resolution having been obtained by predictive coding by performing motion prediction in terms of a pre-set pixel block (macro-block) as a unit and orthogonal transform in terms of a pre-set pixel block (macro-block) as a unit, wherein the method includes variable length decoding compressed picture data of the first resolution, obtained on back-scanning respective coefficients in the orthogonal transform block in accordance with a pre-set scanning system, and inverse orthogonal transforming the coefficients of low frequency components of coefficients of the orthogonal transform blocks of variable length decoded compressed picture data, and wherein up to the highest frequency components of the coefficients of the low frequency components, to be inverse orthogonal transformed by the inverse orthogonal transform means, are decoded. The coefficients of higher frequency components are not decoded and the coefficients, which have not been variable length decoded and which are to be inverse orthogonal transformed by the inverse orthogonal transform means, are set to zero.
With the present picture decoding apparatus, up to the highest frequency components of the coefficients of the low frequency components, to be inverse orthogonal transformed the coefficients of higher frequency components, are variable length decoded, while the coefficients, which have not been variable length decoded and which are to be inverse orthogonal transformed, are set to zero.
With the picture decoding method and apparatus according to the present invention, up to the coefficients of the maximum frequency components of coefficients of the low frequency components to be IDCT are variable length decoded, whilst coefficients of higher frequencies are not variable length decoded. This enables the processing volume to be reduced by not variable length decoding redundant information not decoded on the occasion of the decimating IDCT.
Also, with the picture decoding method and apparatus according to the present invention, up to the coefficients of the maximum frequency components of coefficients of the low frequency components to be IDCTed are variable length decoded, whilst coefficients of higher frequencies are not variable length decoded. The coefficients not variable length decoded and which are to be inverse orthogonal transformed are set to zero. This enables the processing volume to be reduced in accordance with the processing capacity as the picture quality is prohibited from being lowered.
In the picture decoding method and apparatus, decimating IDCT is performed with certain coefficients of the frequency components in the vertical direction of an orthogonal transform block being set to zero. This enables the processing volume to be decreased as picture deterioration is suppressed to a minimum.