1. Field of the Invention
The present invention relates to a video and audio reproducing device. In particular, the present invention relates to a video and audio synchronization controller for decoding coded video and audio data and for synchronizing video data with audio data and a video decoding device in the video and audio reproducing device for preventing the video buffer memory from becoming empty (underflow) or full (overflow).
2. Description of the Prior Art
FIG. 19 shows a conventional video and audio reproducing device comprising a video and audio separator 2, an audio buffer memory 25, an audio decoder 30, a video buffer memory 45, a video and audio synchronization controller 4, and a video decoder 50. The video and audio separator 2 separates coded video and audio data 1 received via a circuit such as a satellite communication line or a CATV wire, or received from a storage medium (a package) such as a CD-ROM and a DVD, video data, audio data, video time stamp (V-TS), and audio time stamp (A-TS), respectively. The audio buffer memory 25 stores and delays the separated audio data as well as the audio time stamps. The audio decoder 30 starts decoding in response to an audio decoding start controlling signal 32 from the video and audio synchronization controller 4. The video buffer memory 45 stores and delays the separated video data as well as video time stamps. The video and audio synchronization controller 4 generates an audio decoding start controlling signal 32 and a video decoding start controlling signal 52 using a video time stamp 49, an audio time stamp 29, and a system clock reference (SCR) 3. The video decoder 50 starts decoding in response to the video decoding start controlling signal 52 output from the video and audio synchronization controller 4.
The operation of the conventional video and audio reproducing device is explained below. The device of FIG. 19 is used for a receiver for the video and audio communication media or a reproducing means for a storage medium such as a CD-ROM and a DVD. The video and audio separator 2 receives the coded video and audio data 1 from a communication line or a storage medium, and separates the coded video and audio data 1 into coded audio data 21, a coded audio time stamp 22, coded video data 41, and a coded video time stamp 42.
Then, the video buffer memory 45 delays the coded video time stamp 42, and outputs it to the video and audio synchronization controller 4 as a video time stamp (V-TS) 49. The video buffer memory 45 delays the coded video data 41, and outputs it to the video decoder 50 as delayed coded video data 48.
The audio buffer memory 25 delays the audio time stamp 22, and outputs it to the video and audio synchronization controller 4 as a delayed time stamp (A-PTS) 29. The buffer memory 25 also delays the coded audio data 21, and outputs them to the audio decoder 30 as coded audio data 28.
The video and audio synchronization controller 4 generates the audio decoding start controlling signal 32 using the audio time stamp 29 and the system clock reference (SCR) 3, and also generates the video decoding start controlling signal 52 using the video time stamp 49 and the system clock reference (SCR) 3. Activated by the audio decoding start controlling signal 32, the audio decoder 30 starts decoding the coded audio data 28 and outputs the audio output 31. The video decoder 50 starts decoding the delayed coded video data 48 activated by the video decoding start controlling signal 52, and outputs the video display signal 51.
In this manner, it is possible to synchronize the audio signal 31 and the video display signal 51 using the audio decoding start controlling signal 32 and the video decoding start controlling signal 52. The synchronization of the audio and the video is explained below.
FIG. 20 shows a video and audio synchronization controller 4 according to the conventional art. The video and audio synchronization controller 4 of FIG. 20 comprises a system time counter 101 for outputting a system time clock (STC) 102, an audio synchronization comparator 103, and a video synchronization comparator 109. The system time counter 101 sets the time using the system clock reference (SCR) 3 which is separated by the video and audio separator 2. The audio synchronization comparator 103 outputs the audio decoding start controlling signal 32 when the system time clock (STC) 102 matches with the delayed audio output time stamp (A-PTS) 29. The video synchronization comparator 109 outputs the video decoding start controlling signal 52 when the system time clock (STC) 102 matches with the delayed video time stamp (V-TS). The system clock reference (SCR) 3 is included in the header of the bit stream, and this system clock reference (SCR) 3 is used as a reference for determining the absolute time of the entire system. This system clock reference (SCR) is inserted into the header of the bit stream at a transmission station which transmits the video signal, or at a recorder where the video is recorded on the package. After setting the time using the system clock reference (SCR) 3, the system time counter 101 counts the system time clock (STC) 102. This STC 102 is used as a reference clock for generating the audio output 31 and the video display signal 51 output from the audio decoder 30 and the video decoder 50, respectively.
FIG. 21A-FIG. 21G are timing charts explaining the operation of the conventional video and audio reproducing device. In FIG. 21A, the coded video and audio data 1 contain video and audio data. The header of each video data frame includes system clock reference (SCR) 3, decoding time stamps (V-DTS), and video presentation time stamps (V-PTS). The header of each audio data frame includes system clock reference (SCR) 3 and audio presentation time stamps (A-PTS). Both V-DTS and V-PTS in the video data are referred to as a video time stamp (V-TS) or simply as a time stamp (TS) below. That is, when mentioning as "time stamp", both V-DTS and V-PTS are included in the "time stamp", unless specified. Pictures are displayed on a monitor (or display) using any one of V-DTS or V-PTS, depending on the type of a video frame to be displayed. The video presentation time stamp (V-PTS) indicates a time when the picture appears at the upper left corner of the monitor. The video decoding time stamp (V-DTS) indicates a time when the decoding of the video frame is started in the ideal decoder whose decoding time is zero. Assuming that time "T" is required for decoding the data in an actual decoder, the decoding should be started at a time V-DTS-T. That is, the decoding should be started earlier time "T" before the ideal time V-TDS, taking into consideration of the time required for the actual decoding process.
The audio signal does not include a decoding time stamp, but only the presentation time stamp (A-PTS). The audio presentation time stamp (A-PTS) indicates a time when the head of an audio frame is output.
After the absolute time is set using SCR, the system time clock (STC) starts counting the time under 90 kHz. FIG. 21B is a timing chart showing this STC count. In FIG. 21B, the horizontal scale represents the time indicted by STC count, which is counted up by 90 kHz clock. Immediately after the system has been powered on, the system time counter 101 is set to SCR count, which are included in the header of the respective frames of the input coded signal. For example, in FIG. 21B, the system time counter 101 is set to SCR count (=93994), which is included in the header of the I picture V(I0), which is the first arriving picture frame after the system has been powered on. Thereafter, the system time counter 101 counts up the system time clock. One frame time of the NTSC video is 33 ms, which corresponds to 3003 clocks. Therefore, as shown in FIG. 21B, the respective frame units equals to 3003 STC counts. The intermediate counts are omitted in FIG. 21B.
I pictures, P pictures, and B pictures are briefly explained below. The description regarding the concept of the I pictures, P pictures, and B pictures is found in Coding Of Moving Pictures And Associated Audio, ISO/IEC JTC1/SC29/WG11 N0803, which gives definitions as follows:
I-picture: inter-coded picture: A picture coded using information only from itself. PA1 P-picture: predictive-coded picture: A picture that is coded using motion compensated prediction from past reference fields or frame. PA1 B-picture: bidirectionally predictive-coded picture: A picture that is coded using motion compensated prediction from a past and/or future reference picture.
FIGS. 22A-22C illustrate the MPEG coding and decoding method according to the conventional art. FIG. 22A shows a frame order of the displayed original video. FIG. 22B shows a frame order of the coded video in the signal stream such as a video signal stream transmitted via communication line or a video signal stream recorded on a CD-ROM. FIG. 22C shows a frame order of the displayed video stream of FIG. 22B after decoded, such as a picture displayed on a monitor.
The operation of the conventional video and audio reproducing device is explained below. As explained above, I picture and P pictures are decoded and displayed on the monitor at the time when the subsequent I picture and P picture are being displayed. B pictures are decoded and displayed almost simultaneously. In other words, all the B pictures are decoded according to V-DTS and are displayed according to V-PTS. To take a closer look at the relationship between V-DTS and V-PTS included in the headers illustrated in FIG. 21A, V-PTS count indicates a time when I picture and P pictures will be displayed next. For example, the V-PTS count in the I picture V(I0) is 103003, which is equal to V-DTS count in the subsequent P picture V(P3). The V-PTS count in the P picture V(P3) is 112012, and this count is equal to V-DTS count in the subsequent P picture V(P6). On the contrary, in the B pictures, V-PTS count and V-DTS count are equal. For example, the V-PT count in the B picture V(B1) is 106006, and this equals to the V-DTS count.
Explaining this in more detail, since V-DTS count included in the header of the frame of the I picture V(I0) is 100000 and the count of V-PTS is 103003, the decoding of the I picture V(I0) ideally starts when STC counts is 100000 (=V-DTS), as shown in FIG. 21C, and the decoded video is ideally displayed when STC counts is 103003 (=V-PTS), that is, when the decoding of the subsequent P picture (P3) begins (=103003). Since these are ideal decoding time and display time and they are different from the actual time, they are represented as dotted lines in the charts.
Since V-DTS count contained in the header of the frame of the P picture V(P3) is 103003 and V-PTS count is 112012, the decoding of the P picture V(P3) ideally starts when STC counts 103003 (=V-DTS) as shown in dotted line in FIG. 21C, and the display of the P picture V(P3) is ideally displayed when STC counts 112012 (=V-PTS), that is, when the decoding of the subsequent P picture (P6) begins (=112012). Since V-DTS count included in the header of the subsequent B picture (B1) is 106006, and V-PTS count is also 106006, the decoding of the B picture V(B1) ideally starts when STC counts 106006 (=V-DTS) as shown in dotted line in FIG. 21C. The B picture V(B1) is displayed simultaneously with the decoding time when STC counts is 106006 as shown in FIG. 21D.
Since the A-PTS count in the header of the audio signal A0 in the second frame in FIG. 21A is 104500, the audio is output when STC count is 104500. In this way, there is no concept of decoding time stamp (DTS) for the audio, and all audios are processed according to the presentation time stamp (PTS).
As explained above, when I picture (I0), B picture (B1), B picture (B2), P picture (P3) . . . are displayed at the counts 103003, 106006, 109009, 112012 . . . and the audios A0, A2 . . . are output at the counts 104500, 110506 . . . , the video and the audio are perfectly synchronized, and it solves the problem that the movement of one's mouth displayed in the monitor does not match with the voice.
However, since the picture is actually displayed synchronously with a frame pulse of the display system as shown in FIG. 21F, the picture is displayed at a different time from the ideal display frame time illustrated in FIG. 21D. The video synchronization comparator 109 compares the system time clock (STC) 102 and the delayed coded video time stamp (V-PTS) 49, and outputs the video decoding start controlling signal 52 and displays the video frame synchronously with the frame pulse of the display system closest to the time when STC=PTS is satisfied. Therefore, the time when the picture actually displayed on the monitor shifts 1/2 frames at the maximum than the ideal frame display time, that is, the picture is actually displayed on the monitor synchronously with the actual display frame of the display system and as shown in FIG. 21G.
In this occasion, the audio is output synchronously with the STC illustrated in FIG. 21B, and the video is displayed synchronously with the clock of the display system which is shown in FIG. 21F. Therefore, at the maximum, the display of the video shifts by 1/2 video frame from STC as shown in FIG. 21G. This causes the display of the video to shift by 1/2 video frame at the maximum from the output of the audio. In this manner, problems occur such that the movement of one's mouth on the display does not match with the voice.