There are various methods of recording audio data and video data on a recording medium, which include analog recording and digital recording onto a video tape and analog recording and digital recording onto a disc medium. In recent years, digital recording on a disc medium is becoming mainstream because of its high quality, high accessibility, and the like. A representative example of the digital recording is a DVD (Digital Versatile Disc). High-quality video data and audio data is recorded/reproduced seamlessly to/from recordable media such as DVD-RW and DVD-RAM (refer to, for example, republication of WO97/13364 (FIGS. 47 and 61)).
As a method of encoding video data, generally, the MPEG (Moving Picture Experts Group) video is used. In the MPEG video (mainly, MPEG-2 video), each of frames (or fields) of video data is encoded with types of I picture, P picture, and B picture. The I picture is a picture which can be decoded independently and I stands for “Intra”. The P picture is a picture which is encoded by using forward prediction from an I picture or another P picture, and P stands for “Prediction”. The B picture is a picture which is encoded by using bi-directional prediction from an I picture or a P picture, and B stands for “Bi-directional-prediction”. The video frame period in the NTSC system is about 1/30 second (to be accurate, 1/29.97 second), and that in the PAL system is 1/25 second.
On the other hand, there are, roughly, two kinds of audio data encoding methods which are linear PCM and compression encoding. In the compression encoding, Dolby digital (AC-3), DTS (Digital Theater Systems), MPEG audio, and the like are often used.
In the linear PCM, digital data subjected to sampling and quantization is transmitted. The number of transmission bits is 16, 20, 24, or the like. As the sampling frequency, 48 kHz or the like is used. It is optional to set the audio frame period. For example, 1/600 second (80 samples at 48 kHz) or the like is used.
In the compression encoding, linear PCM data is compressed by using the orthogonal transformation or a psychoacoustic model. As an audio frame period, the number of samples of the second power (or an integral multiple of the number) is often applied. For example, when the number is 1,024 samples at 48 kHz, the audio frame period is about 21 msec (=1,024/48,000). The reason why the number of samples of the second power is used as the audio frame in the compression coding is that the orthogonal transformation for transforming linear PCM sample data to a spectrum is adapted to input/output samples of the second power.
In the case of encoding a video signal with sound and recording the resultant onto a recording medium such as a DVD, video data and audio data is encoded by the encoding method as described above and, further, multiplexed by an MPEG system. The resultant data is recorded, for example, an MPEG program stream onto a recording medium. Such multiplexed stream data will be called a video object (VOB) hereinbelow.
At this time, the video data and audio data in a VOB is synchronized. A video frame period and an audio frame period usually do not coincide with each other but are recorded in different periods. The video frame period is unconditionally determined depending on a TV system. As the audio frame period, an optimum length is separately set in consideration of, for example, efficiency of compression coding. FIG. 1 shows this state.
As shown in FIG. 1, a video frame period (TV) and an audio frame period (TA) are different from each other. This is an example in which a series of video data and audio data is synchronously recorded and is a typical example in which the frame head of the video data and that of the audio data coincide with other in the data head portion. In an intermediate portion of the data, basically, a video frame border and an audio frame border do not coincide with each other (except for a position corresponding to the least common multiple of the video frame period and the audio frame period).
There is an application of performing edition by connecting, in whole or in part, the two VOBs recorded separately from each other and continuously reproducing the resultant. FIG. 2 shows an example of this application. The diagram shows an example of connection from time X in a VOB(i) to time Y in a VOB(j). The characters I, P, and B attached to video frames indicate the above-described types of pictures. It should be noted that, in the diagram, the video frames and audio frames are displayed in the same order as that of reproduction. In an actual VOB, it is necessary to record frames in order of Ii1, Pi1, Bi1, and Bi2 and enable Bi1 and Bi2 to be decoded by using Ii1 and Pi1.
In order to reproduce pictures from time A to time X in VOB(i) and subsequently reproduce pictures from time Y to time B in VOB(j) at the time of reproduction, attention has to be paid to picture types around the times X and Y included in the reproduction path. Specifically, Pi2 is necessary to decode Bi3 and Bi4 in VOB(i) but Pi2 does not exist in the reproduction path. It is therefore necessary to convert Bi4 to a P-picture type Pi4′ and convert Bi3 to Bi3′ (Bi3′ is encoded from Pi1 and Pi4′).
Similarly, Ij1 is necessary for decoding Pj1 in VOB(j) but does not exist in the reproduction path. Therefore, for example, Pj1 has to be converted to I picture type Ij1′. In such a manner, data can be reproduced in a path extending via Pi1, Bi3′, Pi4′, Ij1′, Bj3, and Bj4. For convenience, a connection point after connection is Z.
The above description relates to an example of the case where interframe prediction such as MPEG is used as the video encoding system. For example, in the case where encoding completes in each frame like the DV system, such a picture converting process is unnecessary. Also in the case of MPEG or the like, for example, in the case of choosing a connection point such that a frame before the connection point is of a B picture immediately preceding to an I picture and a frame after the connection point is of an I picture, the picture converting process is unnecessary.
Next, reproduction of video frames at a connection point Z will be considered. In the above example, basically, it is requested to reproduce a frame Pi4′ for one video frame period and, immediately after that, reproduce an Ij1′ frame. That is, seamless reproduction such that video frame pictures are not stopped at the point Z is in demand. To perform the seamless reproduction, it is generally necessary to satisfy the following conditions.
(1) Data of video frames necessary to encode a path extending via A, Z, and B is included in the path (as described above).
(2) Recording data is disposed in a system so as not to cause a buffer underflow in a specific buffer at the time of reading data of the path extending via A, Z, and B. For this purpose, there is a case such that data around Z or the data in the path extending via A, Z, and B is, in whole or in part, re-recorded in a position different from the position before the connection edition.
(3) A process of resetting a system time clock (STC) is performed before and after the connection point Z for the reason that, generally, the time base in VOB(i) and the time base in VOB(j) are different from each other and, therefore, an STC value at the time X and an STC value at the time Y are different from each other.
Handling of an audio frame in the connection edition will now be considered with reference to FIG. 2. In the case of constructing data so as to connect video frames seamlessly as described above, basically, audio frames cannot be connected seamlessly at a connection point. This is because the audio frame period is different from the video frame period, and it cannot be expected that reproduction end time of the final audio frame before the connection point and reproduction start time of the head audio frame after the connection point coincide with each other. Consequently, as shown by G in FIG. 2, the existence of a gap is conventionally allowed between audio frames.
FIG. 3 is a block diagram showing an example of a conventional reproducing apparatus in the case where the connection edition as shown in FIG. 2 is considered.
In the diagram, data in the path extending via A, Z, and B in FIG. 2, which is read from a recording medium 101 by a not-shown reproducing mechanism is input to a demultiplexer 103 via a track buffer 102. The demultiplexer 103 demultiplexes the data to the video stream and the audio stream (or another stream which is not shown), inputs the video stream to a video decoder 105 via a video buffer 104, and inputs the audio stream to an audio decoder 107 via an audio buffer 106.
The video decoder 105 decodes the video, and the audio decoder 107 decodes the audio. In the case where the video encoding system is the MPEG or the like, a re-order buffer 109 for re-arranging pictures in a reproduction order is disposed on the output side of the video decoder 105. An STC circuit 108 is a circuit for counting STCs on the basis of reference time signals such as system clock references (SCR) extracted from VOB data from the demultiplexer 103. The STC circuit 108 also resets the STCs at the connection point Z. The STC circuit 108 also has the role of generating, in a position where a gap occurs in audio reproduction as shown by G in FIG. 2, a control signal (called a mute signal here) indicative of the gap and muting the audio decoder 107 for the period of the gap.
Next, an example in which the reproduction end time X of the final audio frame before the connection point and the reproduction start time Y of the head audio frame after the connection point coincide with each other at the connection edition point as shown in FIG. 4 will be considered. In this case as well, although Pi2 is necessary to decode Bi3 and Bi4 in VOB(i), Pi2 does not exist in the reproduction path. It is therefore necessary to convert, for example, Bi4 to a P-picture type Pi4′ and convert Bi3 to Bi3′ (Bi3′ is encoded from Pi1 and Pi4′).
Similarly, Ij1 is necessary to decode Pj1 in VOB(j), but Ij1 does not exist in the reproduction path. Therefore, it is necessary to convert, for example, Pj1 to the I picture type Ij1′. In such a manner, data can be reproduced in a path extending via Pi1, Bi3′, Pi4′, Ij1′, Bj3, and Bj4.
On the other hand, with respect to audio data, the reproduction end time X of the final audio frame before the connection point and the reproduction start time Y of the head audio frame before the connection point coincide with each other at the connection edition point. Consequently, as shown in FIG. 4, no gap is created between the audio frame before the connection point and the audio frame after the connection point.