The present invention relates to an image encoder, an image encoding method, an image decoder, an image decoding method, and distribution media. More particularly, the invention relates to an image encoder, an image encoding method, an image decoder, an image decoding method, and distribution media suitable for use, for example, in the case where dynamic image data is recorded on storage media, such as a magneto-optical disk, magnetic tape, etc., and also the recorded data is regenerated and displayed on a display, or in the case where dynamic image data is transmitted from a transmitter side to a receiver side through a transmission path and, on the receiver side, the received dynamic image data is displayed or it is edited and recorded, as in videoconference systems, videophone systems, broadcasting equipment, and multimedia data base retrieval systems.
For instance, as in videoconference systems and videophone systems, in systems which transmit dynamic image data to a remote place, image data is compressed and encoded by taking advantage of the line correlation and interframe correlation in order to take efficient advantage of transmission paths.
As a representative high-efficient dynamic image encoding system, there is a dynamic image encoding system for storage media, based on Moving Picture Experts Group (MPEG) standard. This MPEG standard has been discussed by the International Organization for Standardization (ISO)-IEC/JTC1/SC2/WG11 and has been proposed as a proposal for standard. The MPEG standard has adopted a hybrid system using a combination of motion compensative predictive coding and discrete cosine transform (DCT) coding.
The MPEG standard defines some profiles and levels in order to support a wide range of applications and functions. The MPEG standard is primarily based on Main Profile at Main level (MP@ML).
FIG. 1 illustrates the constitution example of an MP@ML encoder in the MPEG standard system.
Image data to be encoded is input to frame memory 31 and stored temporarily. A motion vector detector 32 reads out image data stored in the frame memory 31, for example, at a macroblock unit constituted by 16 (16 pixels, and detects the motion vectors.
Here, the motion vector detector 32 processes the image data of each frame as any one of an intracoded picture (I-picture), a forward predictive-coded picture (P-picture), or a bidirectionally predictive-coded picture (B-picture). Note that how images of frames input in sequence are processed as I-, P-, and B-pictures has been predetermined (e.g., images are processed as I-picture, B-picture, P-picture, B-picture, P-picture, . . . , B-picture, and P-picture in the recited order).
That is, in the motion vector detector 32, reference is made to a predetermined reference frame in the image data stored in the frame memory 31, and a small block of 16 pixels (16 lines (macroblock) in the current frame to be encoded is matched with a set of blocks of the same size in the reference frame. With block matching, the motion vector of the macroblock is detected.
Here, in the MPEG standard, predictive modes for an image include four kinds: intracoding, forward predictive coding, backward predictive coding, and bidirectionally predictive coding. An I-picture is encoded by intracoding. A P-picture is encoded by either intracoding or forward predictive coding. A B-picture is encoded by either intracoding, forward predictive coding, backward predictive coding, or bidirectionally predictive coding.
That is, the motion vector detector 32 sets the intracoding mode to an I-picture as a predictive mode. In this case, the motion vector detector 32 outputs the predictive mode (intracoding mode) to a variable word length coding (VLC) unit 36 and a motion compensator 42 without detecting the motion vector.
The motion vector detector 32 also performs forward prediction for a P-picture and detects the motion vector. Furthermore, in the motion vector detector 32, a prediction error caused by performing forward prediction is compared with dispersion, for example, of macroblocks to be encoded (macroblocks in the P-picture). As a result of the comparison, when the dispersion of the macroblocks is smaller than the prediction error, the motion vector detector 32 sets an intracoding mode as the predictive mode and outputs it to the VLC unit 36 and motion compensator 42. Also, if the prediction error caused by performing forward prediction is smaller, the motion vector detector 32 sets a forward predictive coding mode as the predictive mode. The forward predictive coding mode, along with the detected motion vector, is output to the VLC unit 36 and motion compensator 42.
The motion vector detector 32 further performs forward prediction, backward prediction, and bidirectional prediction for a B-picture and detects the respective motion vectors. Then, the motion vector detector 32 detects the minimum error from among the prediction errors in the forward prediction, backward prediction, and bidirectional prediction (hereinafter referred to the minimum prediction error as needed), and compares the minimum prediction error with dispersion, for example, of macroblocks to be encoded (macroblocks in the B-picture). As a result of the comparison, when the dispersion of the macroblocks is smaller than the minimum prediction error, the motion vector detector 32 sets an intracoding mode as the predictive mode and outputs it to the VLC unit 36 and motion compensator 42. Also, if the minimum prediction error is smaller, the motion vector detector 32 sets as the predictive mode a predictive mode in which the minimum prediction error was obtained. The predictive mode, along with the corresponding motion vector, is output to the VLC unit 36 and motion compensator 42.
If the motion compensator 42 receives both the predictive mode and the motion vector from the motion vector detector 32, the motion compensator 42 will read out the coded and previously locally decoded image data stored in the frame memory 41 in accordance with the received predictive mode and motion vector. This read image data is supplied to arithmetic units 33 and 40 as predicted image data.
The arithmetic unit 33 reads from the frame memory 31 the same macroblock as the image data read out from the frame memory 31 by the motion vector detector 32, and computes the difference between the macroblock and the predicted image which was supplied from the motion compensator 42. This differential value is supplied to a DCT unit 34.
On the other hand, in the case where a predictive mode alone is received from the motion vector detector 32, i.e., the case where a predictive mode is an intracoding mode, the motion compensator 42 does not output a predicted image. In this case, the arithmetic unit 33 (the arithmetic unit 40 as well) outputs to the DCT unit 34 the macroblock read out from the frame memory 31 without processing it.
In the DCT unit 34, DCT is applied to the output data of the arithmetic unit 33, and the resultant DCT coefficients are supplied to a quantizer 35. In the quantizer 35, a quantization step (quantization scale) is set in correspondence to the data storage quantity of the buffer 37 (which is the quantity of the data stored in a buffer 37) (buffer feedback). In the quantization step, the DCT coefficients from the DCT unit 34 are quantized. The quantized DCT coefficients (hereinafter referred to as quantized coefficients as needed), along with the set quantization step, are supplied to the VLC unit 36.
In the VLC unit 36, the quantized coefficients supplied by the quantizer 35 are transformed to variable word length codes such as Huffman codes and output to the buffer 37. Furthermore, in the VLC unit 36, the quantization step from the quantizer 35 is encoded by variable word length coding, and likewise the predictive mode (indicating either intracoding (image predictive intracoding), forward predictive coding, backward predictive coding, or bidirectionally predictive coding) and motion vector from the motion vector detector 32 are encoded. The resultant coded data is output to the buffer 37.
The buffer 37 temporarily stores the coded data supplied from the VLC unit 36, thereby smoothing the stored quantity of data. For example, the smoothed data is output to a transmission path or recorded on a storage medium, as a coded bit stream.
The buffer 37 also outputs the stored quantity of data to the quantizer 35. The quantizer 35 sets a quantization step in correspondence to the stored quantity of data output by this buffer 37. That is, when there is a possibility that the capacity of the buffer 37 will overflow, the quantizer 35 increases the size of the quantization step, thereby reducing the data quantity of quantized coefficients. When there is a possibility that the capacity of the buffer 37 will be caused to be in a state of underflow, the quantizer 35 reduces the size of the quantization step, thereby increasing the data quantity of quantized coefficients. In this manner, the overflow and underflow of the buffer 37 are prevented.
The quantized coefficients and quantization step, output by the quantizer 35, are not supplied only to the VLC unit 36 but also to an inverse quantizer 38. In the inverse quantizer 38, the quantized coefficients from the quantizer 35 are inversely quantized according to the quantization step supplied from the quantizer 35, whereby the quantized coefficients are transformed to DCT coefficients. The DCT coefficients are supplied to an inverse DCT unit (IDCT unit) 39. In the IDCT 39, an inverse DCT is applied to the DCT coefficients and the resultant data is supplied to the arithmetic unit 40.
In addition to the output data of the IDCT unit 39, the same data as the predicted image supplied to the arithmetic unit 33 is supplied from the motion compensator 42 to the arithmetic unit 40, as described above. The arithmetic unit 40 adds the output data (prediction residual (differential data)) of the IDCT unit 39 and the predicted image data of the motion compensator 42, thereby decoding the original image data locally. The locally decoded image data is output. (However, in the case where a predictive mode is an intracoding mode, the output data of the IDCT 39 is passed through the arithmetic unit 40 and supplied to the frame memory 41 as locally decoded image data without being processed.) Note that this decoded image data is consistent with decoded image data that is obtained at the receiver side.
The decoded image data obtained in the arithmetic unit 40 (locally decoded image data) is supplied to the frame memory 41 and stored. Thereafter, the decoded image data is employed as reference image data (reference frame) with respect to an image to which intracoding (forward predictive coding, backward predictive coding, or bidirectionally predictive coding) is applied.
Next, FIG. 2 illustrates the constitution example of an MP@ML decoder in the MPEG standard system which decodes the coded data output from the encoder of FIG. 1.
The coded bit stream (coded data) transmitted through a transmission path is received by a receiver (not shown), or the coded bit stream (coded data) recorded in a storage medium is regenerated by a regenerator (not shown). The received or regenerated bit stream is supplied to a buffer 101 and stored.
An inverse VLC unit (IVLC unit (variable word length decoder) 102 reads out the coded data stored in the buffer 101 and performs variable length word decoding, thereby separating the coded data into the motion vector, predictive mode, quantization step, and quantized coefficients at a macroblock unit. Among them, the motion vector and the predictive mode are supplied to a motion compensator 107, while the quantization step and the quantized macroblock coefficients are supplied to an inverse quantizer 103.
In the inverse quantizer 103, the quantized macroblock coefficients supplied from the IVLC unit 102 are inversely quantized according to the quantization step supplied from the same IVLC unit 102. The resultant DCT coefficients are supplied to an IDCT unit 104. In the IDCT 104, an inverse DCT is applied to the macroblock DCT coefficients supplied from the inverse quantizer 103, and the resultant data is supplied to an arithmetic unit 105.
In addition to the output data of the IDCT unit 104, the output data of the motion compensator 107 is also supplied to the arithmetic unit 105. That is, in the motion compensator 107, as in the case of the motion compensator 42 of FIG. 1, the previously decoded image data stored in the frame memory 106 is read out according to the motion vector and predictive mode supplied from the IVLC unit 102 and is supplied to the arithmetic unit 105 as predicted image data. The arithmetic unit 105 adds the output data (prediction residual (differential value)) of the IDCT unit 104 and the predicted image data of the motion compensator 107, thereby decoding the original image data. This decoded image data is supplied to the frame memory 106 and stored. Note that, in the case where the output data of the IDCT unit 104 is intracoded data, the output data is passed through the arithmetic unit 105 and supplied to the frame memory 106 as decoded image data without being processed.
The decoded image data stored in the frame memory 106 is employed as reference image data for the next image data to be decoded. Furthermore, the decoded image data is supplied, for example, to a display (not shown) and displayed as an output reproduced image.
Note that in MPEG-1 standard and MPEG-2 standard, a B-picture is not stored in the frame memory 41 in the encoder (FIG. 1) and the frame memory 106 in the decoder (FIG. 2), because it is not employed as reference image data.
The aforementioned encoder and decoder shown in FIGS. 1 and 2 are based on MPEG-1/2 standard. Currently a system for encoding video at a unit of the video object (VO) of an object sequence constituting an image is being standardized as MPEG-4 standard by the ISO-IEC/JTC1/SC29/WG11.
Incidentally, since the MPEG-4 standard is being standardized on the assumption that it is primarily used in the field of communication, it does not prescribe the group of pictures (GOP) prescribed in the MPEG-1/2 standard. Therefore, in the case where the MPEG-4 standard is utilized in storage media, efficient random access will be difficult.
The present invention has been made in view of such circumstances and therefore the object of the invention is to make efficient random access possible.
An image encoder comprises encoding means for partitioning one or more layers of each sequence of objects constituting an image into a plurality of groups and encodes the groups.
An image encoding method partitions one or more layers of each sequence of objects constituting an image into a plurality of groups and encodes the groups.
An image encoder comprises decoding means for decoding a coded bit stream obtained by partitioning one or more layers of each sequence of objects constituting an image into a plurality of groups which are encoded.
An image decoding method decodes a coded bit stream obtained by partitioning one or more layers of each sequence of objects constituting an image into a plurality of groups which were encoded.
A distribution medium distributes the coded bit stream which is obtained by partitioning one or more layers of each sequence of objects constituting an image into a plurality of groups which are encoded.
An image encoder comprises: second-accuracy time information generation means for generating second-accuracy time information which indicates time within accuracy of a second; and detailed time information generation means for generating detailed time information which indicates a time period between the second-accuracy time information directly before display time of the I-VOP, P-VOP, or B-VOP and the display time within accuracy finer than accuracy of a second.
An image encoding method generates second-accuracy time information which indicates time within accuracy of a second; and generates detailed time information which indicates a time period between the second-accuracy time information directly before display time of the I-VOP, P-VOP, or B-VOP and the display time within accuracy finer than accuracy of a second.
An image decoder comprises display time computation means for computing display time of I-VOP, P-VOP, or B-VOP on the basis of the second-accuracy time information and detailed time information.
An image decoding method comprises computing display time of I-VOP, P-VOP, or B-VOP on the basis of the second-accuracy time information and detailed time information.
A distribution medium distributes a coded bit stream which is obtained by generating second-accuracy time information which indicates time within accuracy of a second, also by generating detailed time information which indicates a time period between the second-accuracy time information directly before display time of the I-VOP, P-VOP, or B-VOP and the display time within accuracy finer than accuracy of a second, and adding the second-accuracy time information and detailed time information to a corresponding I-VOP, P-VOP, or B-VOP as information which indicates display time of the I-VOP, P-VOP, or B-VOP.