Various types of data streams have been standardized to encode and compress video data at low bit rates. A system stream compliant with MPEG2 system standard ISO/IEC 13818-1 is known as one such data stream. A “system stream” is a generic term for the three types of streams, namely, program stream (PS), transport stream (TS) and PES stream.
In recent years, more and more attention has been paid to phase change optical discs, MOs and other optical discs as data stream storage media to replace magnetic tapes. A DVD video recording standard (i.e., DVD Specifications for Rewritable/Re-recordable Discs Part 3 VIDEO RECORDING, version 1.0, Sep. 1999, which will be referred to herein as “VR standard”) is currently known as a standard for recording the data stream of some content on a phase change optical disc (such as a DVD) in real time and for making it editable. Also, a DVD Video standard (which will be referred to herein as “Video standard”) is also defined as a standard for a package medium to store the data stream of a read-only content such as a movie thereon.
FIG. 1 shows a data structure for an MPEG2 program stream 10 compliant with the VR standard (which will be referred to herein as a “VR-compliant stream 10”).
The VR-compliant stream 10 includes a plurality of video objects (VOBs) #1, #2, . . . , and #k. Supposing the VR-compliant stream 10 is a content that was taken with a camcorder, for example, each VOB stores moving picture data that was generated during a single video recording session (i.e., since the user started recording the video and until he or she stopped doing it).
Each VOB includes a plurality of VOB units (VOBUs) #1, #2, . . . , and #n. Each VOBU is a data unit containing video data with a video playback time falling within the range of 0.4 second to 1 second in most cases.
Hereinafter, the data structure of VOBUs will be described with the first and second video object units VOBU #1 and VOBU #2 shown in FIG. 1 taken as an example.
VOBU #1 is composed of a number of packs, which belong to a low-order layer of an MPEG program stream. In the VR-compliant stream 10, each pack has a fixed data length (also called a “pack length”) of 2 kilobytes (i.e., 2,048 bytes). At the top of the VOBU, a real time information pack (RDI pack) 11 is positioned as indicated by “R” in FIG. 1. The RDI pack 11 is followed by multiple video packs “V” (including video pack 12) and multiple audio packs “A” (including audio pack 13). It should be noted that even if the playback time is the same but if the video data has a variable bit rate, the data size of each VOBU is changeable within a range defined by a maximum read/write rate. However, if the video data has a fixed bit rate, the data size of each VOBU is substantially constant.
Each pack stores the following information. As disclosed in Japanese Laid-Open Publication No. 2001-197417, for example, the RDI pack 11 stores various information for controlling the playback of the VR-compliant stream 10, e.g., information representing the playback timing of the VOBU and information for controlling copying of the VR-compliant stream 10. The video packs 12 store MPEG2-compressed video data thereon. The audio packs 13 store audio data that was compressed so as to comply with the MPEG2 Audio standard, for example. In adjacent video and audio packs 12 and 13, video and audio data to be played back synchronously with each other may be stored.
VOBU #2 is also made up of a plurality of packs. An RDI pack 14 is placed at the top of VOBU #2, and then followed by a plurality of video packs 15 and a plurality of audio packs 16. The contents of the information to be stored in each of these packs are similar to those of VOBU #1.
It should be noted that the RDI pack is not always positioned at the top of each VOBU within a VOB. Whenever the RDI pack is not located at the top of a VOBU, a video pack is always positioned there.
FIG. 2 shows a relationship between a video stream composed of the video data stored in video packs and an audio stream composed of the audio data stored in audio packs.
Specifically, in VOBU #i, a picture 21b of the video stream is composed of the video data that has been stored in at least one pack including the video pack 21a, the next picture is composed of the video data that has been stored in at least one pack including the video pack 22, and each of the following pictures is also composed of the video data that has been stored in following video packs. Meanwhile, an audio frame 23b is composed of the audio data that has been stored in the audio pack 23a. The same statement applies to the other audio packs. It should be noted that the data in one audio frame may be stored in two or more audio packs separately. Alternatively, multiple audio frames may be included in one audio pack.
Also, any audio frame data included in a VOBU is supposed herein to be complete within that VOBU. That is to say, the audio frame data contained in a VOBU is all included within that VOBU and never included in the next VOBU.
The video and audio frames are played back in accordance with the information specifying the presentation times (i.e., presentation time stamps (PTS)), which is stored in the packet headers of the respective video and audio packs. In the example shown in FIG. 2, the video picture 21b and the audio frame 23b are played back at substantially the time times, i.e., synchronously with each other.
Look at the video packs 24a and 24b of VOBU #i. The last picture 24c of VOBU #i is made up of the video data stored in the video packs 24a through 24b. As described above, each VOBU is constructed with reference to the video playback times, for example, with no special attention paid to the sound. Accordingly, the data in the audio frame 25c additionally includes presentation time information (PTS) so as to be played back synchronously with the video picture 24c, but can still be stored in the audio packs 25a and 25b of the next VOBU #(i+1).
In this manner, the audio frame to be played back synchronously with the video frame has its storage location shifted from that of the video frame. This is because in a system target decoder (P-STD) defining rules in multiplexing the video and audio packs, the data size (e.g., 224 kilobytes) of a video data buffer is much greater than the data size (e.g., 4 kilobytes) of an audio data buffer. The audio data allows just a small amount of data to be accumulated, and is multiplexed so as to be retrieved just before the playback timing.
With respect to such a program stream, the user can register his or her desired VOBU playback order as a “play list”. In accordance with the play list, the player plays back video and so on by acquiring the data of a specified VOBU and then continues the playback by reading data out from the beginning of the specified VOBU.
However, if the video data and audio data to be played back synchronously with each other were stored in different VOBUs, then the sound could be discontinued while the video and audio data are being played back in accordance with the play list. This is because the data is read out continuously from the target VOBU but the audio data, stored in the next non-target VOBU, is not. In that case, only the video is played back but the audio to be played back synchronously with the video is not played back.
In the example illustrated in FIG. 2, the play list may specify that VOBU #k (where k≠(i+1)) should be played back after VOBU #i has been played back. In that case, after the data has been read out from the video picture 24c of VOBU #i, data is read out from the next specified VOBU #k. Accordingly, the data of the audio frame 25c, which is stored in VOBU #(i+1) and which should be played back synchronously with the video picture 24c, is not read out and the sound is not reproduced. As a result, the user hears the sound discontinued during the playback.
Also, even in VOBU #k, the storage location of the audio frame associated with its top video picture changes from one VOBU to another, and is determined by the correlation between VOBU #k and its previous VOBU (i.e., VOBU #(k−1)). More specifically, the storage location is determined by the bit rate of the program stream and the buffer size of the system target decoder (P-STD). Accordingly, even if VOBU #i includes every audio frame to be played back synchronously, VOBU #k does not always store every audio frame to be played back synchronously from its very beginning. This is also why the user hears the sound discontinued during the playback.
An object of the present invention is to reduce significantly, or eliminate if possible, the period in which the sound is discontinued even if the video and audio data are played back in accordance with a play list, for example.