Recently, various methods for cutting down the data sizes of video and audio data, which would take a long time to play back, by compressing and encoding them before writing them on a storage medium have been developed. In International Organization for Standardization (ISO), Moving Picture Image Coding Experts Group (MPEG) of International Electrotechnical Commission (IEC) has been standardizing audio and video coding methods. For example, a video compression method was defined in ISO/IEC 13818-2, an audio compression method was defined in ISO/IEC 13818-3, and a method for synthesizing them was defined in ISO/IEC 13818-1. The last method mentioned is known as the “MPEG System standard”. By using these compression coding techniques, a data stream representing video and audio that would take a long time to play back such as a movie (i.e., an MPEG system stream) can now be stored on a single storage medium (e.g., an optical disk) while maintaining its high quality.
Meanwhile, methods for storing those data on a storage medium have been being standardized, too. For instance, a DVD standard called “DVD Specification for Read-Only Disc Version 1.0” is known. Also, the DVD Video Recording standard “DVD Specifications for Rewritable/Re-recordable Discs” was defined in September 1999 as a standard for recording video and audio on a storage medium.
Hereinafter, processing for playing back video and audio synchronously with each other from a data stream on a storage medium will be described. FIG. 1 shows an arrangement of functional blocks in a conventional player 10 that can play back a system stream. In this example, the system stream is supposed to be a program stream with information about system clock reference SCR.
The player 10 includes an AV parser 1, a video decoder 2, an audio decoder 3 and an STC register 4.
The AV parser 1 receives a system stream that has been provided externally and breaks that system stream into audio data and video data. In the following description, a data stream representing audio will be referred to herein as an “audio stream”, while a data stream representing video will be referred to herein as a “video stream”. Also, the AV parser 1 extracts system clock reference SCR, an audio presentation time stamp APTS and a video presentation time stamp VPTS from the system stream. The AV parser 1 sets a reference value for the STC register 4 based on the system clock reference SCR, and outputs the video stream and VPTS to the video decoder 2 and the audio stream and APTS to the audio decoder 3, respectively. In response, the STC register 4 generates a sync signal STC based on the reference value.
The video decoder 2 decodes the video stream by reference to the sync signal STC and video decoding time stamp VDTS and then outputs the decoded video data at a timing when the sync signal STC matches the VPTS. In the NTSC standard, for example, video presentation time stamps VPTS are added to video at an interval corresponding to about 16.7 ms so as to be synchronized with the times at which field pictures are presented. Also, since 30 video frames are presented per second and one frame consists of two fields according to the NTSC standard, each field is refreshed approximately every 16.7 ms.
On the other hand, the audio decoder 3 decodes the video stream by reference to the sync signal STC and audio decoding time stamp ADTS and then outputs the decoded audio data at a timing when the sync signal STC matches the APTS. For example, audio presentation time stamps APTS are added to audio at an interval corresponding to the audio frame playback timing of about 32 ms.
By performing these processing steps, audio and video can be played back synchronously with each other at the timings that were intended by the maker of the system stream during encoding.
In this example, the sync signal STC is supposed to be generated by reference to the system clock reference SCR. The same reference is used when a digital broadcast is received in real time and clock signals on transmitting and receiving ends need to be synchronized with each other. If the digital broadcast is a transport stream, however, a program clock reference PCR is used.
Meanwhile, in playing back video and audio by reading out a system stream that has already been stored on a storage medium such as an optical disk, it is not always necessary to reproduce the clock signal at the time of encoding by reference to the system clock reference SCR. Alternatively, the sync signal STC may also be set by using the audio presentation time stamps APTS, for example. Thus, an example of such playback will be described with reference to FIG. 2.
FIG. 2 shows an arrangement of functional blocks in another conventional player 20. The player 20 decodes an audio stream and a video stream from a system stream stored on a storage medium, and outputs video synchronously with audio by reference to audio presentation time stamps APTS. Such a player 20 is disclosed in Japanese Patent Application Laid-Open Publication No. 10-136308, for example.
An AV separating section 12 reads a digitally encoded system stream from a data storage device 11 and separates audio and video data that are stored there after having been multiplexed.
A video processing section 13 decodes the video data and sends video header information, obtained during the decoding process, to a delay detecting section 16. The video presentation time stamps VPTS are described in the header information. Also, the video processing section 13 saves the total number of frames of the video data that has ever been, played back since the start of the playback on a video frame counter 18. An audio processing section 14 decodes audio data and sends audio header information, obtained during the decoding process, to a clock generating section 17. The audio presentation time stamps VPTS are described in the header information. Also, the audio processing section 14 saves the total amount of the audio data that has ever been played back since the start of the playback on an audio data counter 19.
The clock generating section 17 calculates a reference time, which is shown as audio playback duration, based on the total amount of data saved in the audio data counter 19 and the audio header information obtained from the audio processing section 14. The delay detecting section 16 calculates the ideal number of frames of the video data that should be output in accordance with the information about the reference time obtained from the clock generating section 17 and the video header information received from the video processing section 13. Also, the delay detecting section 16 compares the ideal number of frames with the actual number of frames obtained by the video frame counter, thereby sensing how the video playback is coming along with the audio playback.
If the delay detecting section 16 has sensed that the video output is behind the audio output, then a frame skipping control section 15 determines frames not to output (i.e., frames to skip) and provides the AV separating section 12 and the video processing section 13 with that information. The video processing section 13 omits the output of those frames to skip but outputs their succeeding frames. As a result, the video delay can be cut down by an amount of time corresponding to the playback duration of one frame (e.g., 33 ms in NTSC) and the video output is no longer trailing behind the audio output. The player 20 can play back audio and video synchronously with each other by such a technique.
The video playback is defined with respect to the audio playback as follows. Suppose the “ideal state” is a state in which video and audio are played back at the timings that were originally intended by the maker during encoding. Generally speaking, if the video is played back within a time frame of −50 ms and 30 ms from the ideal state (i.e., with respect to the audio playback), then a person senses that the audio and video are synchronous with each other. Accordingly, if the video presentation time falls within this permissible range with respect to the audio presentation time, then the video output may be judged as not trailing behind the audio output. Otherwise, the video output may be judged as trailing behind the audio output.
However, if the conventional player played back the video by reference to the audio playback duration, then the following problems would arise.
Specifically, in determining whether or not audio and video are being played back synchronously with each other, the conventional player might judge that the audio and video time lag has exceeded its permissible range, even though the time lag actually falls within the permissible range.
For example, suppose the video is being played back 20 ms behind (which delay falls within the permissible range) the audio. As described above, the APTS is added to an audio frame approximately every 32 ms and the VPTS is added to a video field approximately every 16.7 ms. That is why according to a timing at which the VPTS is compared with the APTS, the video might be played back at most 52 ms behind or ahead of the audio. Particularly if the VPTS and APTS should be compared with each other only just after the VPTS has been updated, then the video would be played back +20 ms to +52 ms later than the audio. Thus, if the lag falls within the range of +30 ms to +52 ms, then the viewer would feel uncomfortable to see the video and audio played back non-synchronously.
Also, if the audio playback ended earlier than the video playback, then the conventional player could no longer continue the video playback after that. This is because once the audio playback has ended, the audio playback duration is not counted anymore and the audio can no longer function as a time reference for playing back the video. In a system stream, in particular, audio data and video data are included as a mixture, and the presentation time of the audio data does not always match that of the video data. Accordingly, the audio data and video data obtained at the end of the data reading operation will finish being played back at mutually different times. And the audio playback may end earlier than the video playback, thus causing various inconveniences.
Thus, an object of the present invention is to synchronize audio and video with each other just as intended when the video is played back by reference to the audio presentation time. Another object of the present invention is to play back the video continuously even if the audio playback has ended earlier than the video playback.