Personal computers have been adapted to run multimedia software applications which include audio and video information. Several multimedia specification committees have established and proposed standards for encoding/compressing and decoding/decompressing audio and video information. MPEG I and II, established by the Motion Picture Expert Group, are the most widely accepted international standards in the field of the multimedia PC software applications. Other standards are JPEG and Motion JPEG established by the Joint Photographic Expert Group.
FIG. 1A illustrates an MPEG audio and video decoding system 120 which decompresses video and/or audio data compressed and coded according to the MPEG algorithm. System decoder 110 reads encoded MPEG data stream 101, which may include interspersed compressed video and/or audio data, and generates timing information as Video Presentation Time Stamp (VPTS) 104, System Clock Reference (SCR) 105, and Audio Presentation Time Stamp (APTS) 106. Video decoder 111 decodes and decompresses video data stream 102 and generates a decoded/decompressed video signal 107. Audio decoder 112 decodes and decompresses audio data stream 103 and generates decoded/decompressed audio signal 108. Decoded/decompressed video signal 107 may be coupled to a PC monitor or other type of display while decoded/decompressed audio signal 108 may be coupled to an audio speaker or other audio generation means (not shown).
FIG. 1B, from page 49 of the ISO/IEC 11172-1:1993(E) International Standard specification for MPEG, incorporated herein by reference, illustrates a detailed diagram of how the data stream of encoded/compressed data may be encapsulated and communicated using packets. Data stream 160 may have different layers such as an ISO layer and a Pack layer. In the ISO layer, a series of packages 161 are communicated until an ISO end code 164 is reached. Each package 161 may be defined as having a Pack Start Code 162 and Pack Data 163. At the pack layer, each package 161 may be defined as having a pack start code 162, a system clock reference 117, a system header 180, and packets of data 165-168. Ellipses 167 illustrate a number of packets. System clock reference 117 may be further defined to be bit pattern 0010, three bits of X 185, bit pattern of 1, fifteen bits of Y 186, bit pattern 1, fifteen bits of Z 187, bit pattern 11, multiplexer rate 188, and bit pattern 1. Three bits of X 185, fifteen bits of Y 186, and fifteen bits of Z 187 make up a 33 bit pattern representing the system clock reference (SCR). The system clock reference represents the referenced system time.
Multiplexer rate 188 represents how often audio packets are interspersed between video packets. Each packet 165-168 may be illustrated similar to packet 166. Packet 166 has a three byte packet start code prefix 170A, a one byte stream ID 170B, a two byte packet length 171, h-bytes of other header data 172, and N-bytes of packet data 173. N-bytes of packet data 173 may represent audio or video data. When using a compression/encoding method such as MPEG I, MPEG II, or JPEG, the data packets are encoded appropriately. The h-bytes of other header data 172 may comprise one to sixteen stuffing bytes 140, code bits 141, one flag bit 142 for a standard buffer scale, thirteen standard buffer size bits 143, and one, five, or ten bytes of Time Stamp information 150 respectively representing nothing, a presentation time stamp (PTS), or a presentation time stamp (PTS) with a decoding time stamp (DTS).
The presentation time stamp may be an audio presentation time stamp (APTS) if the following data packet 173 contains audio information. Alternatively it may be a video presentation time stamp (VPTS) if the following data packet 173 contains video information. In either of these cases the APTS or the VPTS may be represented by five bytes or 33 bits of information with 7 bits unused.
FIG. 3A illustrates a simplified example 315 of the encoded/compressed data stream 101 as compared to FIG. 1B. An encoded/compressed data stream may contain a plurality of encoded/compressed video data packets or blocks and a plurality of encoded/compressed audio data packets or blocks. MPEG encodes/compresses video packets based on video frames or pictures.
Three types of video frames may be used. An intra or I-type frame may comprise a frame of video data coded using information about itself. Only one given noncompressed video frame may be encoded/compressed into one I-type frame of encoded/compressed video data. A predictive or P-type frame may comprise a frame of video data encoded/compressed using motion compensated prediction from a past reference frame. A previous encoded/compressed frame, such as I-type or P-type may be used to encode/compress a current noncompressed frame of video data into a P-type frame of encoded compressed video data. A bi-directional or B-type of frame may comprise a frame of video data encoded/compressed using a motion compensated prediction from a past and future reference frame, or a past, or a future reference frame of video data. A reference frame may be either an I-type frame or a P-type frame.
B-type frames are usually inserted between I-type or P-type frames, combinations, or either, where fast motion occurs within an image across frames. Motion compensation refers to using motion vectors from one frame to the next to improve the efficiency of predicting pixel values for encoding/compression and decoding/decompression. The method of prediction uses the motion vectors to provide offset values and error data which refer to a past or a future frame of video data having decoded pixel values which may be used with the error data to compress/encode or decompress/decode a given frame of video data.
More data may be required to display an image than to generate accompanying audio, as an image may have varying resolutions and include motion and frame rates may be greater. Thus, video data packets such as 303-305 may occur more frequently within the MPEG data stream than audio data packets such as 311. The infrequent interspersion of audio data packets between video data packets may cause an image frame to be displayed before or after the audio has been reproduced. Time stamps are provided within the encoded/compressed data stream to facilitate the synchronization of audio and video.
Video presentation time stamps (VPTS) 300-302 are provided at various intervals 306-308 of a given system time clock 316. The audio presentation time stamps (APTS) exemplified by 310 are also provided at various intervals 312 of the MPEG data stream. Additionally, there may be a system clock reference (SCR) 317 provided at various intervals 318. Each of these SCR, VPTS, and APTS are 33 bit values representing a time value. The MPEG standard recommends that the MPEG decoder use the 33-bit VPTS as the starting time of the video display sequence and the 33-bit APTS for the starting time of the audio playback sequence. The APTS and VPTS may jointly be referred to as presentation time stamps (PTS). The MPEG standard may require that a APTS, VPTS, and SCR show up in the bitstream at least once every seven tenths (0.7) of a second.
In the prior art, the 33-bit system clock reference (SCR) has been used as the reference time for both video and audio display to minimize the deviation between video and audio playback. The SCR was loaded into a counter, referred to as the system counter, and incremented by a 90 kilohertz system clock (SCLK). The output of the system counter was compared with the VPTS within video decoder 111 and the APTS within audio decoder 112 to determine by how much the audio or video playback was out-of-sync. If a threshold level was reached, the video would jump to be correctly in sync with the audio. Thus, the SCR may be used to resynchronize the video playback with the audio playback.
In some decoding systems, a video clock or decoding clock may be generated without reference to the SCR and it may not be locked or corrected such that a time drift (lead or lag) may appear in the synthesized VPTS derived from the video or decoding clock. This time drift may cause in one second, or 90000 system clock cycles, time errors on the order of 50 parts per million. This may be equivalent to the synthesized VPTS values differing from actual VPTS values by 44 to 67 .mu.secs. In systems which do not correct for out-of-sync conditions, the time error may accumulate and cause the video image to lead or lag the audio playback by 1 frame every 5 to 6 minutes. The frame lead or lag may also accumulate over larger periods of time if the video display and the audio playback are not occasionally resynchronized.