In general, the present invention relates to a picture encoding apparatus, a picture encoding method, a picture decoding apparatus, a picture decoding method and a presentation medium. More particularly, the present invention relates to a good picture encoding apparatus, a good picture encoding method, a good picture decoding apparatus, a good picture decoding method and a good presentation medium typically used for recording moving-picture data onto an optical magnetic disc or a magnetic tape, playing back the recorded data and displaying the reproduced data on a display as well as used for transmitting moving-picture data from a transmitter side to a receiver side by way of a transmission line, displaying, editing and recording the received moving-picture data on the receiver side in systems such as a television conference system, a television telephone system, broadcasting equipment and a multimedia data-base search system.
In a system for transmitting moving-picture data to a remote destination, such as a television conference system or a television telephone system, the moving-picture data is subjected to a compression-encoding process by taking advantage of line correlation or interframe correlation in order to allow the transmission to be utilized a high degree of efficiency.
As a representative high-performance system for encoding a moving picture, there is provided an MPEG (Moving Picture Experts Group) system which is a kind of cumulative moving-picture encoding technique. The MPEG system is proposed as a standard system, as a result of discussions by an ISO-IEC/JTC1/SC2/WG11. The MPEG system adopts a hybrid method combining a motion-compensation predicting/encoding technique and a DCT (Discrete Cosine Transform) encoding technique.
In the MPEG system, some profiles and levels are defined in order to keep up with a variety of applications and functions. The most basic one is a so-called main profile main level (MP@ML (Main Profile at Main Level).
FIG. 53 is a block diagram showing a typical configuration of an MP@ML encoder of the MPEG system.
Picture data to be encoded is supplied to a frame memory 31 to be temporarily stored therein. A motion-vector detector 32 reads out the picture data stored in the frame memory 31 in macroblock units each composed of typically 16 pixelsxc3x9716 pixels and detects motion vectors thereof.
The motion-vector detector 32 processes picture data of each frame as an I-picture (in an intraframe encoding process), a P-picture (in a forward-prediction encoding process) of a B-picture (in a bidirectional-prediction encoding process). It should be noted that pictures of frames sequentially supplied to the motion-vector detector 32 are processed as an I, P or B-picture in an order determined in advance. For example, the frames are processed in the order of I, B, P, B, P, - - - , B and P-pictures, for example.
To put it in detail, the motion-vector detector 32 refers to a predetermined reference frame set in advance in the picture data stored in the frame memory 31 and detects a motion vector of a macroblock of the reference frame and a frame being encoded currently by carrying out a pattern matching process (a block matching process) on a small block which is the macroblock with dimensions of 16 pixelsxc3x9716 lines.
The MPEG system has four picture-prediction modes, namely, an intra- encoding mode (an intraframe encoding mode), a forward-prediction encoding mode, a backward-prediction encoding mode and a bidirectional-prediction encoding mode. An I-picture is encoded in the intra-encoding mode and a P-picture is encoded either in the intra-encoding mode or the forward-prediction encoding mode. A B-picture is encoded in either one of the intra-encoding mode, the forward-prediction encoding mode, the backward-prediction encoding mode and the bidirectional-prediction encoding mode.
That is to say, the motion-vector detector 32 sets the intra-encoding mode as a prediction mode for an I-picture. In this case, the motion-vector detector 32 does not detect a motion vector. Instead, the motion-vector detector 32 supplies information indicating the prediction mode (that is, the intra-encoding mode in this case) to a VLC (Variable-Length Coder) unit 36 and a motion compensator 42.
In the case of a P-picture, the motion-vector detector 32 carries out forward prediction on the picture to detect a motion vector thereof. Furthermore, the motion-vector detector 32 compares a prediction error obtained as a result of the forward prediction with, for example, a variance of a macroblock being encoded. If the comparison indicates that the variance of the macroblock is smaller than the prediction error, the motion-vector detector 32 sets the intra-encoding mode as a prediction mode and supplies information indicating the intra-encoding mode to the VLC unit 36 and the motion compensator 42. If the comparison indicates that the prediction error obtained as a result of the comparison is smaller than the variance of the macroblock, on the other hand, the motion-vector detector 32 sets the forward-prediction encoding mode as a prediction mode and supplies information indicating the forward-prediction encoding mode to the VLC unit 36 and the motion compensator 42 along with a detected motion vector.
In the case of a B-picture, the motion-vector detector 32 carries out forward, backward and bidirectional predictions on the picture to detect a motion vector thereof. Furthermore, the motion-vector detector 32 detects the smallest one among prediction errors resulting from the forward, backward and bidirectional predictions and compares the smallest prediction error thus detected (which is referred to hereafter also as a minimum prediction error for the sake of convenience) with, for example, a variance of a macroblock being encoded. If the comparison indicates that the variance of the macroblock is smaller than the smallest prediction error, the motion-vector detector 32 sets the intra-encoding mode as a prediction mode and supplies information indicating the intra-encoding mode to the VLC unit 36 and the motion compensator 42. If the comparison indicates that the minimum prediction error is smaller than the variance of the macroblock, on the other hand, the motion-vector detector 32 sets an encoding mode producing the minimum prediction error as a prediction mode and supplies information indicating the set prediction mode to the VLC unit 36 and the motion compensator 42 along with a detected motion vector.
Receiving the prediction mode and the detected motion vector from the motion-vector detector 32, the motion compensator 42 reads out picture data from a frame. memory 41 and supplies the picture data to processors 33 and 40. It should be noted that the picture data stored in the frame memory 41 to be read out by the motion compensator 42 has been encoded and then decoded back locally in the MP@ML encoder.
The processor 33 reads out the macroblock from the frame memory 31 and computes a difference between the macroblock and a predicted picture received from the motion compensator 42. The processor 33 then supplies the difference to a DCT unit 34. It should be noted that the macroblock is a macroblock of the picture data read out by the motion-vector detector 32 from the frame memory 31.
If the motion-compensator 42 receives only information on a prediction mode from the motion-vector detector 32, that is, if the intraframe-encoding mode is set as a prediction mode, on the other hand, no prediction picture is output. In this case, the processor 33 (and the processor 40) does not carry out any processing in particular, merely passing on a macroblock read out from the frame memory 31 to the DCT unit 34 as it is.
In the DCT unit 34, the data output by the processor 33 is subjected to DCT processing. A DCT coefficient obtained as a result of the DCT processing is supplied to the quantizer 35. In the quantizer 35, a quantization step (quantization scale) appropriate for a data accumulation quantity of a buffer 37 (the amount of data stored in the buffer 37 which is also referred to as a buffer feedback) is set. The DCT coefficient received from the DCT unit 34 is quantized at the quantization step. The quantized DCT coefficient (which is referred to hereafter simply as a quantized coefficient for the sake of convenience) and information on the quantization step are supplied to a VLC unit 36.
In the VLC unit 36, the quantized coefficient received from the quantizer 35 is converted into a variable-length code such as a Huffman code which is then output to the buffer 37. In addition, the VLC unit 36 also converts the information on the quantization step received from the quantizer 35, the information on the prediction mode and the motion vector which are received from the motion-vector detector 32 each into a variable-length code. As described above, the prediction mode can be any one of the intra-encoding mode (or the intraframe-prediction encoding mode), the forward-prediction encoding process, the backward-prediction encoding process and the bidirectional-prediction encoding process. An encoded bitstream obtained as a result of the conversion is supplied to the buffer 37.
By temporarily storing the encoded bitstream generated by the VLC unit 36 in the buffer 37, the data amount thereof is smoothed before the bitstream is output to a transmission line or recorded onto a recording medium.
The buffer 37 outputs the data accumulation quantity thereof to the quantizer 35. As described above, the quantizer 35 sets a quantization step appropriate for the data accumulation quantity received from buffer 37. To put it in detail, when an overflow is about to occur in the buffer 37, the quantizer 35 increases the quantization step so as to reduce the data amount of the quantized coefficient. When an underflow is about to occur in the buffer 37, on the other hand, the quantizer 35 decreases the quantization step so as to increase the data amount of the quantized coefficient. As a result, an overflow and an underflow in the buffer 37 are prevented from occurring.
The quantizer 35 outputs a quantized coefficient and information on a quantization step not only to the VLC unit 36, but also to an inverse quantizer 38. In the inverse quantizer 38, the quantized coefficient received from the quantizer 35 is subjected to inverse quantization in accordance with the quantization step set by the quantizer 35 for being converted back into a DCT coefficient. The DCT coefficient is then supplied to an IDCT unit (Inverse DCT unit) 39. In the IDCT unit 39, the DCT coefficient is subjected to inverse-DCT processing and data obtained as a result of the inverse-DCT processing is supplied to the processor 40.
In addition to the data output by the IDCT unit 39, the processor 40 also receives data from the motion compensator 42 which is also supplied to the processor 33 as described earlier. The processor 40 adds the data output by the IDCT unit 39 (a prediction residual or differential data) to prediction picture data received from the motion compensator 42 in order to carry out a local decoding process to produce the original picture data. The picture data output to the frame memory 41 as a result of the local decoding process is also referred to as locally decoded picture data. (If the intra-encoding mode is set as a prediction mode, however, the data output by the IDCT unit 39 is passed through the processor 40 as it is, being supplied to the frame memory 41 as locally decoded picture data) . It should be noted that the locally decoded picture data is the same as data obtained as a result of a decoding process carried out on the receiver side.
The decoded picture data (locally-decoded picture data) output by the processor 40 is supplied to the frame memory 41 to be stored therein. Later on, the locally decoded picture data is used as reference picture data (a reference frame) for a picture subjected to an interframe encoding process (forward-prediction encoding, backward-prediction encoding or bidirectional-prediction encoding.)
FIG. 54 is a block diagram showing a typical configuration of an MP@ML decoder of the MPEG system for decoding an encoded bitstream output by the encoder shown in FIG. 53.
An encoded bitstream transmitted through a transmission line is received by a receiver not shown in the figure, or an encoded bitstream recorded on a recording medium is played back by a playback apparatus also not shown in the figure. The encoded bitstream received by the receiver or an encoded bitstream played back by the playback apparatus is supplied to a buffer 101 to be stored therein.
An IVLC (Inverse VLC) unit 102 serving as a variable-length decoder reads out the encoded bitstream stored in the buffer 101, carrying out a variable-length decoding process on the encoded bitstream to split the bitstream into a motion vector, information on a prediction mode and a quantization step, and a quantized coefficient in macroblock units. The motion vector and the information on a prediction mode are supplied to a motion compensator 107 whereas the information on a quantization step and the quantized coefficient of a macroblock are supplied to an inverse quantizer 103.
The inverse quantizer 103 carries out inverse quantization on the quantized coefficient of a macroblock received from the IVLC unit 102 in accordance with a quantization step supplied also by the IVLC unit 102, outputting a DCT coefficient obtained as a result of the inverse quantization to an IDCT unit 104. The IDCT unit 104 carries out an inverse DCT process on the DCT coefficient of the macroblock received from the inverse quantizer 103, supplying data obtained as a result of the inverse DCT processing to a processor 105.
In addition to the data output by the IDCT unit 104, the processor 105 also receives data output by the motion compensator 107. To put it in detail, much like the motion compensator 42 shown in FIG. 53, the motion compensator 107 reads out picture data already-decoded and stored in a frame memory 106 in accordance with the motion vector and the prediction mode received from the IVLC 102 for supplying the decoded picture data to the processor 105 as prediction picture data. The processor 105 adds the data output by the IDCT unit 104 (a prediction residual or differential value) to prediction picture data received from the motion compensator 107 in order to carry out a decoding process to produce an original picture data. The original picture data is output to the frame memory 106 to be stored therein. It should be noted that, if the intra-encoding mode is set as a prediction mode for data output by the IDCT unit 104, however, the data is then passed through the processor 105 as it is, being supplied to the frame memory 106 as decoded picture data to be stored therein.
The decoded picture data stored in the frame memory 106 is used as reference picture data (a reference frame) for picture data to be decoded later. In addition, the decoded picture data is supplied as an output playback picture to typically a display not shown in the figure to be displayed thereon.
It should be noted that, in the MPEG1 and MPEG2 systems, a B-picture is not used as reference picture data. Thus, locally decoded picture data of a B-picture is stored neither in the frame memory 41 shown in FIG. 53, nor in the frame memory 106 shown in FIG. 54.
The encoder and the decoder shown in FIGS. 53 and 54, respectively conform to MPEG1/2 specifications. With regard to a system for encoding data in VO (Video Object) units, specificationization work of an MPEG4 (Moving Picture Experts Group) system is being done by an ISO-IEC/JTC1/SC29/WG11. A VO unit is a sequence of objects such as bodies composing a picture.
By the way, specificationization work of the MPEG4 system is under way to define specifications to be utilized mainly in the field of communication. For this reason, a GOP (Group Of Pictures) prescribed in the MPEG1/2 system is not prescribed in the MPEG4 system for the time being. Accordingly, when the MPEG4 system is utilized in storage media, it is expected that efficient random accesses are difficult to make.
In order to solve this problem, the applicants of a patent for the present invention have already proposed a GOV (Group of VOP) corresponding to the GOP prescribed in the MPEG1/2 system as disclosed in Japanese Patent Laid-open No. Hei10-80758 and the GOV was introduced in the MPEG4 system.
In the MPEG4 system, on the other hand, picture data is converted into a hierarchical structure comprising at least two hierarchical layers, making it possible to carry out flexible scalable encoding/decoding processes utilizing a picture at each of the hierarchical layers.
By the way, since a relation between GOVs of picture data at hierarchical layers is not prescribed in the MPEG4 system, a GOV can be inserted for each hierarchical layer independently. Since pieces of picture data at hierarchical layers are not independent, however, it is expected that efficient random accesses are difficult to make in some cases when a GOV is inserted for each hierarchical layer independently.
It is thus an object of the present invention addressing the problems described above to provide a picture encoding apparatus, a picture encoding method, a picture decoding apparatus, a picture decoding method and a presentation medium which allow efficient random accesses to be made.
According to the present invention, there is provided a picture encoding apparatus for encoding a picture and outputting an encoded bitstream obtained as a result of encoding the picture including; a hierarchy forming means for converting a sequence of objects composing the picture into a hierarchy comprising two or more hierarchical layers, a first encoding means for encoding a first hierarchical layer of the sequence of objects output by the hierarchy forming means by dividing the first hierarchical layer into a plurality of groups, and a second encoding means for encoding a second hierarchical layer of the sequence of objects output by the hierarchy forming means by dividing the second hierarchical layer into a plurality of groups in such a way that an object at the second hierarchical layer to be displayed at a time coinciding with or immediately after a display time of an object first displayed in a group pertaining to the first hierarchical layer is first displayed in a group pertaining to the second hierarchical layer.
According to the present invention, there is provided a picture encoding method for encoding a picture and outputting an encoded bitstream obtained as a result of encoding the picture having the steps of; converting a sequence of objects composing the picture into a hierarchy comprising two or more hierarchical layers, encoding a first hierarchical layer of the sequence of objects by dividing the first hierarchical layer into a plurality of groups, and encoding a second hierarchical layer of the sequence of objects by dividing the second hierarchical layer into a plurality of groups in such a way that an object at the second hierarchical layer to be displayed at a time coinciding with or immediately after a display time of an object first displayed in a group pertaining to the first hierarchical layer is first displayed in a group pertaining to the second hierarchical layer.
By a virtue of the picture encoding apparatus and the picture encoding method, it is possible to make efficient random accesses at a high speed.
According to the present invention, there is provided a picture decoding apparatus for decoding a picture including a receiving means for receiving an encoded bitstream and a decoding means for decoding the encoded bitstream, wherein the encoded bitstream is obtained by executing the steps of; converting a sequence of objects composing the picture into a hierarchy comprising two or more hierarchical layers, encoding a first hierarchical layer of the sequence of objects by dividing the first hierarchical layer into a plurality of, groups, and encoding a second hierarchical layer of the sequence of objects by dividing the second hierarchical layer into a plurality of groups in such a way that an object at the second hierarchical layer to be displayed at a time coinciding with or immediately after a display time of an object first displayed in a group pertaining to the first hierarchical layer is first displayed in a group pertaining to the second hierarchical layer.
According to the present invention, there is provided a picture decoding method for decoding a picture having the steps of receiving an encoded bitstream and decoding the encoded bitstream, wherein the encoded bitstream is obtained by executing the steps of; converting a sequence of objects composing the picture into a hierarchy comprising two or more hierarchical layers, encoding a first hierarchical layer of the sequence of objects by dividing the first hierarchical layer into a plurality of groups, and encoding a second hierarchical layer of the sequence of objects by dividing the second hierarchical layer into a plurality of groups in such a way that an object at the second hierarchical layer to be displayed at a time coinciding with or immediately after a display time of an object first displayed in a group pertaining to the first hierarchical layer is first displayed in a group pertaining to the second hierarchical layer.
As a result, it is possible to make efficient random accesses at a high speed.
According to the present invention, there is provided a presentation medium for presenting an encoded bitstream obtained as a result of encoding a picture, wherein the encoded bitstream is obtained as a result of executing the steps of; converting a sequence of objects composing the picture into a hierarchy comprising two or more hierarchical layers, encoding a first hierarchical layer of the sequence of objects by dividing the first hierarchical layer into a plurality of groups, and encoding a second hierarchical layer of the sequence of objects by dividing the second hierarchical layer into a plurality of groups in such a way that an object at the second hierarchical layer to be displayed at a time coinciding with or immediately after a display time of an object first displayed in a group pertaining to the first hierarchical layer is first displayed in a group pertaining to the second hierarchical layer. As a result, it is possible to make efficient random accesses to the encoded bit stream at a high speed.
According to the present invention, there is provided a picture encoding apparatus for encoding a picture and outputting an encoded bitstream obtained as a result of encoding the picture including; a hierarchy forming means for converting a sequence of objects composing the picture into a hierarchy comprising two or more hierarchical layers, an encoding means for dividing a first or second hierarchical layer of the sequence of objects output by the hierarchy forming means into one or more groups, for encoding each of the groups and for including a one-second resolution first-object display time representing a display time of a first displayed object of each of the groups pertaining to the first or second hierarchical layer in terms of one-second resolution units in the group, an adding means for providing each of the objects pertaining to the first or second hierarchical layer with one-second resolution relative time information representing a display time of the object relative to the one-second resolution first-object display time in terms of one-second resolution units, and a resetting means for resetting the one-second resolution relative time information added to any one of the objects pertaining to the second hierarchical layer in dependence on a difference in display time between the object and another object of the second hierarchical layer adjacent to the object in a display sequence.
According to the present invention, there is provided a picture encoding method for encoding a picture and outputting an encoded bitstream obtained as a result of encoding the picture having the steps of; converting a sequence of objects composing the picture into a hierarchy comprising two or more hierarchical layers, dividing a first or second hierarchical layer of the sequence of objects into one or more groups, encoding each of the groups and including a one-second resolution first-object display time representing a display time of a first displayed object of each of the groups pertaining to the first or second hierarchical layer in terms of one-second resolution units in the group, and providing each of the objects pertaining to the first or second hierarchical layer with one-second resolution relative time information representing a display time of the object relative to the one-second resolution first-object display time in terms of one-second resolution units, wherein the one-second resolution relative time information added to any one of the object pertaining to the second hierarchical layer is reset in dependence on a difference in display time between the object and another object of the second hierarchical layer adjacent to the object in a display sequence.
As a result, it is possible to prevent the encoding efficiency from becoming poor.
According to the present invention, there is provided a picture decoding apparatus for decoding a picture including; a receiving means for receiving an encoded bitstream and a decoding means for decoding the encoded bitstream which is obtained by executing the steps of; converting a sequence of objects composing the picture into a hierarchy comprising two or more hierarchical layers, dividing a first or second hierarchical layer of the sequence of objects into one or more groups, encoding each of the groups and including a one-second resolution first-object display time representing a display time of a first displayed object of each of the groups pertaining to the first or second hierarchical layer in terms of one-second resolution units in the group, and providing each of the objects pertaining to the first or second hierarchical layer with one-second resolution relative time information representing a display time of the object relative to the one-second resolution first-object display time in terms of one-second resolution units, wherein the one-second resolution relative time information added to any one of the objects pertaining to the second hierarchical layer is reset in dependence on a difference in display time between the object and another object pertaining to the second hierarchical layer adjacent to the object in a display sequence.
According to the present invention, there is provided a picture decoding method for decoding a picture having the steps of; receiving an encoded bitstream and decoding the encoded bitstream which is obtained by executing the steps of; converting a sequence of objects composing a picture into a hierarchy comprising two or more hierarchical layers, dividing a first or second hierarchical layer of the sequence of objects into one or more groups, encoding each of the groups and including a one-second resolution first-object display time representing a display time of a first displayed object of each of the groups pertaining to the first or second hierarchical layer in terms of one-second resolution units in the group, and providing each of the objects pertaining to the first or second hierarchical layer with one-second resolution relative time information representing a display time of the object relative to the one-second resolution first-object display time in terms of one-second resolution units, wherein the one-second resolution relative time information added to any one of the objects pertaining to the second hierarchical layer is reset in dependence on a difference in display time between the object and another object pertaining to the second hierarchical layer adjacent to the object in a display sequence.
As a result, it is possible to decode an encoded bitstream which has been subjected to prevention of deterioration of an encoding efficiency thereof.
According to the present invention, there is provided a presentation medium for presenting an encoded bitstream obtained as a result of encoding a picture, in which the encoded bitstream is generated by executing the steps of; converting a sequence of objects composing a picture into a hierarchy comprising two or more hierarchical layers, dividing a first or second hierarchical layer of the sequence of objects into one or more groups, encoding each of the groups and including a one-second resolution first-object display time representing a display time of a first displayed object of each of the groups pertaining to the first or second hierarchical layer in terms of one-second resolution units in the group, and providing each of the objects pertaining to the first or second hierarchical layer with one-second resolution relative time information representing a display time of the object relative to the one-second resolution first-object display time in terms of one-second resolution units to the objects, wherein the one-second resolution relative time information added to any one of the objects pertaining to the second hierarchical layer is reset in dependence on a difference in display time between the object and another object pertaining to the second hierarchical layer adjacent to the object in a display sequence.
As a result, it is possible to present an encoded bitstream which has been subjected to a process to prevent an encoding efficiency thereof from deteriorating.
According to the present invention, there is provided a picture encoding apparatus for encoding a picture and outputting an encoded bitstream including; a hierarchy forming means for converting a sequence of objects composing the picture into a hierarchy comprising two or more hierarchical layers including a high-level hierarchical layer and a low-level hierarchical layer for enabling space scalability, a first encoding means for encoding a sequence of objects pertaining to the low-level hierarchical layer output by the hierarchy forming means, and a second encoding means for encoding a sequence of objects pertaining to the high-level hierarchical layer output by the hierarchy forming means in the same order as a display sequence of the objects pertaining to the high-level hierarchical layer.
According to the present invention, there is provided a picture encoding method for encoding a picture and outputting an encoded bitstream having the steps of; receiving the picture, converting a sequence of objects composing the picture into a hierarchy comprising two or more hierarchical layers including a high-level hierarchical layer and a low-level hierarchical layer for enabling space scalability, and encoding a sequence of objects pertaining to the low-level hierarchical layer as well as encoding a sequence of objects pertaining to the high-level hierarchical layer in the same order as a display sequence of the objects pertaining to the high-level hierarchical layer.
As a result, the encoding process is made simple.
According to the present invention, there is provided a picture decoding apparatus for decoding a picture including; a receiving means for receiving an encoded bitstream and a decoding means for decoding the encoded bitstream, wherein the encoded bitstream is obtained by executing the steps of; converting a sequence of objects composing the picture into a hierarchy comprising two or more hierarchical layers including a high-level hierarchical layer and a low-level hierarchical layer for enabling space scalability, and encoding a sequence of objects pertaining to the low-level hierarchical layer as well as encoding a sequence of objects pertaining to the high-level hierarchical layer in the same order as a display sequence of the objects pertaining to the high-level hierarchical layer.
According to the present invention, there is provided a picture decoding method for decoding a picture having the steps of; receiving an encoded bitstream and decoding the encoded bitstream obtained by executing the steps of; converting a sequence of objects composing the picture into a hierarchy comprising two or more hierarchical layers including a high-level hierarchical layer and a low-level hierarchical layer for enabling space scalability, and encoding a sequence of objects pertaining to the low-level hierarchical layer as well as encoding a sequence of objects pertaining to the high-level hierarchical layer in the same order as a display sequence of the objects pertaining to the high-level hierarchical layer.
As a result, the decoding process is made simple.
According to the present invention, there is provide a presentation medium for presenting an encoded bitstream obtained as a result of encoding a picture, wherein the encoded bitstream is generated by executing the steps of; converting a sequence of objects composing the picture into a hierarchy comprising two or more hierarchical layers including a high-level hierarchical layer and a low-level hierarchical layer for enabling space scalability, and encoding a sequence of objects pertaining to the low-level hierarchical layer as well as encoding a sequence of objects pertaining to the high-level hierarchical layer in the same order as a display sequence of the objects pertaining to the high-level hierarchical layer. As a result, it is possible to present an encoded bitstream which can be decoded with ease.