1. Field of the Invention
The present invention relates to a technology for reproducing or recording images, and it particularly relates to method and apparatus for reproducing coded image data, image recording apparatus and to a data structure of moving image data that can be utilized for these method and apparatuses.
2. Description of the Related Art
The digitalization of TV broadcasting is advancing rapidly. The digital broadcasting has already begun in BS (Broadcast Satellite) broadcasts and CS (Communication Satellite) broadcasts, and the plan is set for the digitalization of ground wave broadcasts as well. In digital TV broadcasting, the use of MPEG-2 (Moving Picture Expert Group 2), which is an international standard for data compression and expansion, makes it possible not only to transmit and store information at high efficiency but also to transmit multiple channels by a single repeater. It is thus also expected to provide greater convenience to the users.
On the other hand, the widespread use of portable terminals in recent years is expected to create greater needs for coding systems with high data compression ratio. Accordingly, investigations are being conducted on the use of MPEG-4 coding methods that can transmit images compressed at low bit rates. In digital TV broadcasting from now on, it seems that MPEG-4 will be used along with MPEG-2 for the distribution of image information.
In MPEG, the compression technique called “inter-frame prediction” is employed in coding the moving image data. The inter-frame prediction is a technique such that data of a frame to be coded are predicted and compressed based on data of a coded frame that corresponds to that in the past or future of said frame. In the inter-frame prediction in MPEG, adopted are not only forward prediction that performs prediction based on frames in the past but also bidirectional prediction that performs prediction based on both past frames and future frames.
In MPEG-2, three types of pictures called I picture (Intra-Picture), P picture (Predictive-Picture) and B picture (Bidirectionally Predictive-Picture) are defined to realize this bidirectional prediction. An I picture is an image independently produced by an intra-frame coding processing, irrespective of past and future reproduced images, and can be decoded by data within its image. All of macroblocks within the I picture are produced by the intra-frame coding processing. A P picture is produced by an inter-frame forward coding processing using the prediction based on a past I or P picture. The macroblocks within the P picture include both an intra-frame coded macroblock and an inter-frame coded macroblock by forward prediction.
The B picture is produced by the inter-frame coding processing using the bidirectional prediction. In the bidirectional prediction, a B picture is produced by one of the following three predictions.    (1) Forward Prediction; prediction from a past I picture or P picture.    (2) Backward Prediction; prediction from a future I picture or P picture.    (3) Bidirectional Prediction; prediction from past and future I pictures or P pictures.
The macroblocks within the B picture contain an intra-frame coded macroblock and an inter-frame coded macroblock by a forward prediction, backward prediction or interpolation prediction.
In MPEG-4, a video object in time sequence is called VO (Video Object) and each image that constitutes the VO is called VOP (Video Object Plane). The VOP corresponds to the picture in MEPG-2. The following four types of VOPs are available depending on the prediction coding used.    (1) I-VOP; intra-frame coded VOP.    (2) P-VOP; inter-frame forward-prediction coded VOP.    (3) B-VOP; inter-frame bidirectional-prediction coded VOP.    (4) S-VOP; sprite VOP.
The first three VOPs, which are I-VOP, P-VOP and B-VOP, correspond to I picture, P picture and B picture in MPEG-2, respectively.
In MPEG, the coded image data are expressed as bit stream data having a hierarchical structure. Motion pictures handled in MPEG are constituted by, for example, 30 frames for a second. In MPEG-2, the frame generally corresponds to the picture. In MPEG-2, the collection of pictures is called GOP (Group Of Picture), so that the random access is possible in units of GOP. In order to carry out the random access, at least one I picture is required within the GOP. In MPEG-4, the collection of VOP is handled as GOV (Group of VOP).
Thus, when the coded data stream according to MPEG standard is decoded and reproduced, data are accessed in units of GOP or GOV. However, the number of pictures (or VOPS) contained in one GOP (or GOV) is not necessarily fixed. In MPEG-2, as a standard level there are usually 15-30 pixels per GOP whereas there are about 120 GOPs per GOV in MPEG-4. However, for example, when a scene changes exists, a GOP may be counted anew from the very picture where the scene change occurred, and so forth. Thereby, the structure of GOP may have been changed at the time of editing. Thus, the number, types or order of pictures contained in the GOP is not necessarily fixed even within a series of coded data streams. Since the bidirectional prediction is performed in MPEG, the appearance order of pictures differs from the actual display order in the coded data stream, so that a complicated decoding processing will be required. And in the case where the structure of pictures is irregular, the processing will further become complicated if special reproduction processing, such as high-speed reproduction and reverse reproduction, is to be performed in particular. This complexity causes to make it impossible to perform the appropriate processing as the case may be.