1. Field of the Invention
This invention relates to a picture encoding method and apparatus, a picture decoding method and apparatus and a furnishing medium. More particularly, it relates to a picture encoding method and apparatus, a picture decoding method and apparatus and a furnishing medium which may be used for picture encoding for recording moving picture data on a recording medium, such as a magneto-optical disc or a magnetic tape, reproducing and displaying the data on a display, and which may be used for transmitting the moving picture data via a transmission route from a transmitting side to a receiving side over a transmission route as in the case of the teleconferencing system, television telephone system, broadcast equipment, or a multimedia database retrieval system, in order for the receiving side to display the received moving picture data, edit and record the received data.
2. Description of the Related Art
In a system for transmitting moving picture data to a remote site, such as the teleconferencing system or a television telephone system, line correlation or frame-to-frame picture correlation is utilized for efficiently exploiting the television receiver in order to effect compression encoding of picture data.
Typical of the high efficiency encoding system for moving pictures is the Moving Picture Experts Group (MPEG) system. This has been discussed in the ISO-IEC/JTC1/SC2/WG11 and proposed as a standard draft. This MPEG system employs a hybrid system which is the combination of the motion compensation predictive encoding and discrete cosine transform (DCT) encoding.
In the MPEG system, several profiles and levels are defined for coping with various applications and functions. Most basic is the main profile level (MP@ML: Main Profile at Main Level).
FIG. 1 shows an illustrative structure of a MP@ML encoder in the MPEG system.
The input picture data to be encoded are inputted to and temporarily stored in a frame memory 31.
A motion vector detector 32 reads out the picture data stored in the frame memory 31 in terms of a macro-block made up of, for example, 16xc3x9716 pixels, as a unit, to detect its motion vector.
The motion vector detector 32 processes picture data of respective frames as an intra-frame picture (I-picture), a predictive-coded picture (P-picture) or as a bidirectional-coded picture (B-picture). It is predetermined by which one of the I-, P- and B-pictures the pictures of sequentially inputted frames is to be processed. For example, these pictures are processed in a sequence of I, B, P, B, P, . . . B, P.
That is, the motion vector detector 32 refers to a predetermined pre-set reference frame in the picture data stored in the frame memory 31 and effects pattern matching (block matching) between the reference frame and a small block (macro-block) of 16 pixels by 16 lines of a frame being encoded to detect the motion vector of the macro-block.
In the MPEG system, there are four picture prediction modes, namely an intracoding (intra-frame coding), a forward prediction coding, a backward prediction coding and bidirectional prediction coding. An I-picture is encoded by intra-frame coding, while a P-picture is encoded by intra-frame coding or forward prediction coding, and a B-picture is encoded by intra-frame coding, forward prediction coding, backward prediction coding or by bidirectional prediction coding.
Therefore, the motion vector detector 32 sets the intra-frame coding as a prediction mode for an I-picture. In this case, the motion vector detector 32 outputs only the information on the prediction mode (herein the intra-frame prediction mode) to a variable length encoding (VLC) unit 36 and to a motion compensation unit 42, without detecting the motion vector.
The motion vector detector 32 makes forward prediction for the P-picture to detect its motion vector. The motion vector detector 32 compares a prediction error arising from forward prediction to, for example, the variance of the macro-block being encoded (herein a macro-block of a P-picture). If, as a result of comparison, the variance of the macro-block is larger than the prediction error, the motion vector detector 32 sets the intra-coding mode as the prediction mode and outputs the information on this mode along with the detected motion vector to the VLC unit 36 and to the motion compensation unit 42. If it is the prediction error arising from forward prediction that is smaller the motion vector detector 32 sets the forward prediction mode as the prediction mode to send the detected motion vector and the information on the mode to the VLC unit 36 and to the motion compensation unit 42.
The motion vector detector 32 also effects forward prediction, backward prediction and bidirectional prediction for a B-picture to detect respective motion vectors. The motion vector detector 32 detects the smallest one of prediction errors incurred in the forward prediction, backward prediction and bidirectional prediction. This detected error is referred to below as the smallest prediction error. The motion vector detector 32 then compares this smallest prediction error to, for example, the variance of the macro-block being encoded (macro-block of the B-picture). If, as the result of comparison, the variance of the macro-block is smaller than the smallest prediction error, the motion vector detector 32 sets the intra-coding mode as the prediction mode and outputs the information on the mode along with the detected motion vector to the VLC unit 36 and to the motion compensation unit 42. If it is the smallest prediction error that is smaller, the motion vector detector 32 sets, as the prediction mode, that prediction mode for which the smallest prediction error has been obtained, and outputs the mode information along with the detected motion vector to the VLC unit 36 and to the motion compensation unit 42.
On reception of both the prediction mode and the motion vector from the motion vector detector 32, the motion compensation unit 42 reads out encoded and previously locally decoded picture data which is stored in the frame memory 41, in accordance with the prediction mode and the motion vector, to route the read-out picture data as prediction picture data to arithmetic units 33, 40.
The arithmetic unit 33 reads out from the frame memory 31 the same macro-block as the picture data read out from the frame memory 31 by the motion vector detector 32, and computes the difference between the macro-block and the prediction picture from the motion compensation 42. This difference value is sent to a DCT unit 34.
If the motion compensation unit 42 has received only the prediction mode from the motion vector detector 32, that is if the prediction mode is the intra-coding mode, the motion compensation unit 42 does not output a prediction picture. In this case, the arithmetic unit 33, 40 do not perform any particular processing and output the macro-block read out from the frame memory 31 directly to the DCT unit 34.
The DCT unit 34 performs DCT processing on the output data of the arithmetic unit 33 and routes:the resulting DCT coefficients to a quantizer 35. The quantizer 35 quantizes the DCT coefficients from the DCT unit 34 at a quantization step (quantization scale), which is set in the quantizer 35 in association with the data storage quantity in a buffer 37 (volume of data stored in the buffer 37) that is buffer feedback. The quantized DCT coefficients, sometimes referred to below as quantization coefficients, are routed to the VLC unit 36 along with the as-set quantization steps.
The VLC unit 36 converts the quantization coefficients routed from the quantizer 35 into, for example, a variable length code, such as Huffmann code, and outputs these codes to the buffer 37. The VLC unit 36 also variable length encodes the prediction mode (the mode indicating which of the intra-prediction, forward prediction, backward prediction or the bidirectional prediction has been set) and the motion vector from the motion vector detector 32 to output the resulting encoded data to the buffer 37.
The buffer 37 temporarily stores the encoded data from the VLC unit 36 to smooth the data volume to output the data as an encoded bitstream to, for example, a transmission route or to record the data on a recording medium.
The buffer 37 outputs the stored data volume to the quantizer 35 which then sets the quantization step in accordance with the volume of stored data from the buffer 37. That is, in case of impending overflow of the buffer 37, the quantizer 35 increases the quantization step to lower the volume of data of the quantization coefficients. In case of impending underflow of the buffer 37, the quantizer 35 decreases the quantization step to increase the volume of data of the quantization coefficients. This prohibits overflow or underflow of the buffer 37.
The quantization coefficients and the quantization step outputted by the quantizer 35 are routed not only to the VLC unit 36 but also to the dequantizer 38. The dequantizer 38 dequantizes the quantization coefficients from the quantizer 35 in accordance with the quantization step from the quantizer 35. This converts the quantization coefficients to DCT coefficients which are then routed to an inverse DCT (IDCT) unit 39. The IDCT unit 39 inverse discrete cosine transforms the DCT coefficients to route the resulting data to the arithmetic unit 40.
The arithmetic unit 40 is fed not only with the output data from the IDCT unit 39 but also with the same data as the prediction picture supplied from the motion compensation unit 42 to the arithmetic unit 33. The arithmetic unit 40 sums the output data of the IDCT unit 39 (prediction residuals or difference data) of the IDCT unit 39 to the prediction picture data from the motion compensation unit 42 to locally decode the original picture data to output the locally decoded picture data. However, if the prediction mode is the intra-coding, the output data of the IDCT unit 39 is passed through the arithmetic unit 40 so as to be directly routed as the locally decoded picture data to the frame memory 41. Meanwhile, this decoded picture data is the same as the decoded picture data obtained on the receiver.
The decoded picture data obtained by the arithmetic unit 40, that is the locally decoded picture data, is sent to and stored in the frame memory 41 so as to be used subsequently as reference picture data (reference frame) for a picture encoded by inter-coding (forward prediction, backward prediction or bidirectional prediction).
FIG. 2 shows an illustrative structure of a MP@ML decoder in the MPEG used for decoding the encoded data outputted by the encoder of FIG. 1.
In the decoder, a buffer 101 is fed with an encoded bitstream which is received by a receiver, not shown, over a television receiver, or which is reproduced by a reproducing device, not shown, from an encoded bitstream recorded on a recording medium. The buffer 101 transiently records this encoded bitstream.
An IVLC unit (variable-length decoding unit) 102 reads out the encoded data stored in the buffer 101 to variable-length decode the read-out data to separate the encoded data into a motion vector, prediction mode, quantization step and quantization coefficients on the macro-block basis. Of these data, the motion vector and the prediction mode are sent to the motion compensation unit 107, while the quantization coefficients of the macro-block and the quantization step are routed to a dequantizer 103.
The dequantizer 103 dequantizes the quantization coefficients of a macro-block supplied from an IVLC unit 102, in accordance with the quantization step similarly supplied by the IVLC unit 102, to output the resulting DCT coefficients to an IDCT unit 104. The IDCT unit 104 inverse discrete cosine transforms the DCT coefficients from the dequantizer 103 to route resulting data to the arithmetic unit 105.
The arithmetic unit 105 is fed not only with the output data of the IDCT unit 104, but also with output data of a motion compensation unit 107. That is, similarly to the motion compensation unit 42 of FIG. 1, the motion compensation unit 107 reads out the previously decoded picture data in accordance with the motion vector from the IVLC unit 102 and the prediction mode to route the read-out picture data to the arithmetic unit 105 as prediction picture data. The arithmetic unit 105 sums the output data of the IDCT unit 104 (prediction residuals (difference value)) to the prediction picture data from the motion compensation unit 107 to decode the original picture data. The decoded picture data is outputted as playback picture data, while being sent to and stored in the frame memory 106. If output data of the IDCT unit 104 is intracoded data, the output data is passed through the arithmetic unit 105 so as to be directly supplied to and stored in the frame memory 106.
The decoded picture data stored in the frame memory 106 is used as reference picture data for subsequently decoded picture data. The decoded picture data is routed to and displayed on, for example, a display, not shown, as a reproduced output picture.
Meanwhile, since B-pictures are not used as referecne picture data in the MPEG1 or 2, these B-pictures are not stored in the frame memory 41 (FIG. 1) nor in the frame memory 106 (FIG. 2) in the encoder or in the decoder.
The encoder or the decoder shown in FIGS. 1 and 2 are constructed in accordance with the standards of the MPEG1 or MPEG2. The operations of standardization as the MPEG 4 are now going on in connection with an encoding system on the video object basis in ISO-IEC/JTC1/SC29/EG11. The video object (VO) is a sequence of objects, such as object, making up a picture.
Meanwhile, the MPEG 4 provides that a picture format prescribed in FIG. 3, termed a 4:2:0 format, is the sole format for encoded/decoded pictures.
In this 4:2:0 format, lumninance signals Y and two chroma signals Cr, Cb, as shown in FIG. 3, are used.
The 4:2:0 format is such a picture format in which each if chrominance Cr, Cb is allocated for two scanning lines of the luminance Y and two horizontal pixels, that is in which one pixel each of the chrominance Cr, Cb is allocated to the four pixels of the luminance Y, with the positions of the chrominance Cr, Cb being the same as that of the luminance Y.
Meanwhile, the positions of the chrominance Cr, Cb with respect to the luminance Y differ not only with the positions of FIG. 3 but also with the device in use.
With the 4:2:0 format, since one pixel each of the chrominance Cr, Cb is allocated to the four pixels of luminance Y, the chrominance Cr, Cb are lowered in resolution than the luminance Y.
Therefore, the 4:2:2 format or the 4:4:4 format is used, in place of the 4:2:0 format, for pictures of high quality such as those required by the broadcast station, depending on the usage.
In the 4:2:2 format, one pixel each of the chrominance Cr, Cb is used for one horizontal scanning line of the luminance Y and one pixel in the horizontal direction (one Cr pixel and one Cb pixel for two pixels of the luminance Y), as shown in the pixel arraying diagram of FIG. 4.
In the 4:4:4 format, one pixel each of the chrominance Cr, Cb is used for one horizontal scanning line of the luminance Y and one pixel in the horizontal direction, as shown in the pixel arraying diagram of FIG. 5. That is, the luminance Y and the chrominance Cr, Cb have the same positions and the same number of pixels.
Thus, in the 4:2:2 format or in the 4:4:4 format, the number of pixels of the chrominance signals is larger than in the case of the 4:2:0 format, so that these 4:2:2 and 4:4:4 formats can be used with advbantage for a picture in need of high picture quality.
However, since the MPEG4 provides only the picture of the 4:2:0 format, such that it is impossible to use the 4:2:2 format or the 4:4:4 format for encoded or decoded pictures.
The MPEG4 also is formulated to encode not only a picture but also the shape information. Moreover, the method for encoding the shape information is associated only with the 4:2:0 format such that it is not associated with the 4:2:2 format nor with the 4:4:4 format.
It is therefore an object of the present invention to provide a picture encoding method and apparatus, a picture decoding method and apparatus and a furnishing medium whereby the MPEG4 is expanded such as to permit the use of the 4:2:2 format or the 4:4:4 format in the MPEG4.
In one aspect, the present invention provides a picture encoding method and apparatus in which reading of a flag indicating the encoding state of the chrominance block and a flag indicating the encoding state of the chrominance block associated with the chrominance type is adaptively changed responsive to a flag indicating the chrominance format adapted for setting the type and the number of chrominance pixels allocated to the luminance pixels constituting the luminance block and a flag indicating the state of the encoding of the chrominance block.
In another aspect, the present invention provides a provides a picture encoding method and apparatus in which the position of a block used for prediction of AC coefficients and DC coefficients by the discrete cosine transform is changed responsive to a flag indicating the chrominance format adapted for setting the type and the number of chrominance pixels allocated to the luminance pixels constituting the luminance block.
In a further aspect, the present invention provides a picture decoding method and apparatus in which reading of a flag indicating the encoding state of the chrominance block and a flag indicating the encoding state of the chrominance block associated with the chrominance type is adaptively changed responsive to a flag indicating the chrominance format adapted for setting the type and the number of chrominance pixels allocated to the luminance pixels constituting the luminance block and a flag indicating the state of the encoding of the chrominance block, and in which the encoded picture data is decoded responsive to the read-in flags.
In a further aspect, the present invention provides a picture decoding method and apparatus in which the position of a block used for prediction of AC coefficients and DC coefficients by the discrete cosine transform is set responsive to a flag indicating the chrominance format adapted for setting the type and the number of chrominance pixels allocated to the luminance pixels constituting the luminance block.
In a further aspect, the present invention provides a furnishing medium in which the encoded picture data furnished has been generated responsive to a read-in flag indicating the encoding state of the chrominance block and a read-in flag indicating the encoding state of the chrominance block associated with the chrominance type as the reading of the flags is adaptively changed responsive to a flag indicating the chrominance format adapted for setting the type and the number of chrominance pixels allocated to the luminance pixels constituting the luminance block and a flag indicating the state of the encoding of the chrominance block.
In yet another aspect, the present invention provides a furnishing medium in which the encoded picture data is furnished as the position of a block used for prediction of AC coefficients and DC coefficients by the discrete cosine transform is set responsive to a flag indicating the chrominance format adapted for setting the type and the number of chrominance pixels allocated to the luminance pixels constituting the luminance block.
According to the present invention, the MPEG4 can be expanded to permit the use of the 4:2:2 format or the 4:4:4 format, by employing a flag indicating the chrominance format of a picture or by showing the encoding pattern of the chrominance block, using the above-mentioned means, to enable the encoding/decoding of the respective chrominance formats.