Technologies of recording an AV (Audio, Video) stream into which video data and audio data are multiplexed on a record medium have been practically used. In addition, technologies of recording information about random accessible positions of an AV stream as attribute information to a record medium along with the AV stream and reproducing the AV stream with the attribute information, allowing the read positions to be decided and a decoding process to be quickly performed, are described in Parent Document 1 “Japanese Patent Application Laid-Open No. 2000-341640” and Patent Document 2 “Japanese Patent Application Laid-Open No. 2002-158972”.
As a more specific example, the case of which a transport stream as an AV stream into which MPEG2 video streams are multiplexed will be described. An MPEG video stream is made by compression-encoding video data according to the MPEG2 (Moving Pictures Experts Group 2) system.
According to the MPEG2 (Moving Pictures Experts Group 2), video data are compression-encoded by intra-frame compression-encoding using DCT (Discrete Cosine Transform) and inter-frame compression-encoding using prediction encoding in time base directions. In this case, B (Bidirectionally) picture and P (Predictive) picture that are prediction-encoded in time base directions and I (Intra) picture that is complete with one screen (one frame) are defined. A group that contains at least one I picture and that is complete is referred to as a GOP (Group Of Picture). One GOP is the minimum accessible unit of an MPEG stream.
A transport stream is transmitted, recorded, and reproduced with transport packets each of which has a predetermined size. A data stream is divided by the size of a payload of a transport packet. A header is added to a payload. As a result, a transport packet is completed.
According to the foregoing Patent Document 1 and Patent Document 2, time management information (PTS: Presentation Time Stamp) of a reproduction output of an I picture that starts with a sequence header of MPEG2 video and a source packet number of an AV stream file of a transport packet (source packet) that contains a first byte of the sequence header in the payload are taken out of the transport stream. The obtained PTS and source packet number are recorded as a random accessible position, namely information about an entry point (EP), to attribute information referred to as EP_map for each entry point.
On the other hand, an encoding method that uses a prediction mode in which a picture that is later than an I picture that belongs to the current GOP in the display order is predicted from a picture that belongs to a GOP that is earlier than the current GOP in the display order has been proposed. When a transport stream is encoded using this prediction mode, if it is randomly acceded with GOPs, they are not fully reproduced. A technology of allowing such an AV stream file to be randomly accessed with an I picture that belongs to the current GOP by prohibiting such a prediction mode has been disclosed in Patent Document 3 “U.S. Pat. No. 5,543,847”.
Next, this technology will be described with reference to FIG. 1A and FIG. 1B. In FIG. 1A and FIG. 1B, “i12” represents an I picture; “p02”, “p03”, . . . represent P pictures; and “b00”, “b01”, . . . represent B pictures. The upper row and the lower row of each of FIG. 1A and FIG. 1B represent for example even fields and odd fields, respectively.
Patent Document 3 proposes that a P picture is predicted from the nearest two P pictures. Thus, in the example shown in FIG. 1A, the picture p16 that belongs to GOP 1 is encoded with two most adjacent P pictures as reference pictures that are the picture p13 that belongs to the current GOP 1 and the picture p03 that belongs to GOP 0 that is earlier than GOP 1. When GOP 1 is randomly accessed, it is reproduced from the picture i12. Since the picture p13 cannot reference the picture p03 used as a reference picture, the picture p13 cannot be decoded. In addition, the picture p16 that uses the pictures p03 and p13 as reference pictures cannot be decoded. Likewise, the picture p17 that uses the pictures p13 and p16 as reference pictures cannot be decoded.
Thus, when video data are encoded, it is prohibited that the pictures p13 and p16 use the picture p03 as a reference picture that belongs to GOP 0 that is earlier than GOP 1. Instead, the pictures p13 and p16 use the picture i12 that belongs to GOP 1 as a reference picture. Thus, when GOP 1 is randomly accessed, the picture p13 and p16 are predicted from the picture i12 as a reference picture. Thus, pictures after the picture p17 can be decoded.
Likewise, in FIG. 1B, the picture p18 that belongs to GOP 1 is encoded with two most adjacent reference pictures of the picture p15 that belongs to GOP 1 and the picture p03 that belongs to GOP 0 earlier than GOP 1. When GOP 1 is randomly accessed, it is reproduced from the picture i12. Since the picture p15 cannot reference the picture p03 used as a reference picture, the picture p15 cannot be decoded. Likewise, the picture p18 that uses the pictures p03 and p15 as reference pictures cannot be decoded.
In this case, when the video stream is encoded, it is prohibited that the pictures p15 and p18 use the picture p03 as a reference picture that belongs to GOP 0 earlier than GOP 1. The pictures p15 and p18 use the picture i12 as a reference picture that belongs to GOP 1. Thus, when GOP 1 is randomly accessed, the pictures p15 and p18 are predicted from the picture i12 as a reference picture. As a result, the picture p18 can be decoded.
In the foregoing EP_map, the position of an I picture of a video stream is used as an entry point. In the MPEG2 video, there is no prediction mode of which a picture later than an I picture that belongs to the current GOP is in the display order is predicted from a picture that belongs to a GOP later than the current GOP in the display order. Thus, when an I picture is used as an entry point, it is assured that the current GOP is randomly accessed and reproduced from the I picture.
However, in recent years, a moving picture compression-encoding system, MPEG-4 AVC|H.264, has been internationally standardized by ISO (International Organization for Standardization). The MPEG-4 AVC|H.264 system accomplishes higher encoding efficiency and compression rate than do the conventional encoding systems such as MPEG2 and MPEG4 systems. In addition, the MPEG-4 AVC|H.264 system achieves high transmission efficiency using a plurality of transmission channels through which data are transmitted. Thus, the MPEG-4 AVC|H.264 system can transmit video streams with higher degree of freedom than the related art systems.
Since the MPEG-4 AVC|H.264 system can have a plurality of reference pictures, it can reference a plurality of past pictures. For example, in the MPEG-4 AVC|H.264 system, a P picture that is later than a particular I picture can be predicted from P pictures that are earlier than the I picture in the display order.
Thus, in the related art, when a video stream that has been encoded by an encoding system such as the MPEG-4 AVC|H.264 system that can reference a plurality of past pictures is recorded to a record medium and then reproduced therefrom, if an I picture is recorded as an random-accessible position (entry point) to EP_map, it is not assured that pictures that are random-access reproduced do not usually start with an I picture.