Personal computers have been adapted to run multimedia software applications which include audio and video information. Several multimedia specification committees have established and proposed standards for encoding/compressing and decoding/decompressing audio and video information. MPEG I and II, established by the Motion Picture Expert Group, are the most widely accepted international standards in the field of the multimedia PC software applications. Other standards are JPEG and Motion JPEG established by the Joint Photographic Expert Group. FIG. 1A illustrates an MPEG audio and video decoding system 120 which decompresses of the video and/or audio data compressed and coded according to the MPEG algorithm. The system decoder 110 reads encoded MPEG data stream 101, which may include interspersed compressed video and/or audio data, and generates timing information as Video Presentation Time Stamp (VPTS) 104, System Clock Reference (SCR) 105, and Audio Presentation Time Stamp (APTS) 106. The video decoder 111 decodes and decompresses the video data stream 102 and generates a decoded/decompressed video signal 107. The audio decoder 112 decodes and decompresses the audio data stream 103 and generates the decoded/decompressed audio signal 108. The decoded/decompressed video signal 107 is coupled to a PC monitor or other type of display while the decoded/decompressed audio signal 108 is coupled to an audio speaker or other audio generation means (not shown).
FIG. 1B, from page 49 of the ISO/IEC 11172-1:1993(E) International Standard specification for MPEG, incorporated herein by reference, illustrates a detailed diagram of how the data stream of encoded/compressed data may be encapsulated and communicated using packets. The data stream 160 may have different layers such as an ISO layer and a Pack layer. In the ISO layer a series of packages 161 are communicated until an ISO end code 164 is reached. Each package 161 may be defined as having a Pack Start Code 162 and Pack Data 163. At the pack layer, each package 161 may be defined as having a pack start code 162, a system clock reference 117, a system header 180, and packets of data 165-168. The ellipses 167 illustrates a number of packets. The system clock reference 117 may be further defined to be bit pattern 0010, three bits of X 185, bit pattern of 1, fifteen bits of Y 186, bit pattern 1, fifteen bits of Z 187, bit pattern 11, multiplexer rate 188, and bit pattern 1. The three bits of X 185, the fifteen bits of Y 186, and the fifteen bits of Z 187 make up a 33 bit pattern representing the system clock reference (SCR). The system clock reference represents the referenced system time. The multiplexer rate 188 represents how often audio packets are interspersed between video packets. Each packet 165-168 may be illustrated similar to packet 166. Packet 166 has a three byte packet start code prefix 170A, a one byte stream ID 170B, a two byte packet length 171, h-bytes of other header data 172, and N-bytes of packet data 173. The N-bytes of packet data 173 may represent audio or video data. In the case of using a compression/encoding method such as MPEG I, MPEG II, or JPEG, the data packets are encoded appropriately. The h-bytes of other header data 172 may comprise one to sixteen stuffing bytes 140, code bits 01 141, one bit flagging the standard buffer scale 142, thirteen bits indicating the standard buffer size 143, and one, five, or ten bytes of Time Stamp information 150 respectively representing nothing, a presentation time stamp (PTS), or a presentation time stamp (PTS) with a decoding time stamp (DTS). The presentation time stamp may be an audio presentation time stamp (APTS) if the following data packet 173 contains audio information. Alternatively it may be a video presentation time stamp (VPTS) if the following data packet 173 contains video information. In either of these cases the APTS or the VPTS may be represented by five bytes or 33 bits of information with 7 bits unused.
FIG. 3A illustrates a simplified example 315 of the encoded/compressed data stream 101 as compared to FIG. 1B. An encoded/compressed data stream such as this may contain a plurality of encoded/compressed video data packets or blocks and a plurality of encoded/compressed audio data packets or blocks. MPEG encodes/compresses the video packets based on video frames which may also be referred to as pictures. Three types of video frames may be used. An intra-frame or I-type frame or picture is a frame of video data which is coded using information about itself. Only one given noncompressed video frame is encoded/compressed into one I-type frame of encoded/compressed video data. A predictive-frame or P-type frame or picture is a frame which is encoded/compressed using motion compensated prediction from a past reference frame. A previous encoded/compressed frame, such as I-type or P-type is used to encode/compress a current noncompressed frame of video data into a P-type frame of encoded compressed video data. A bi-directional-frame or B-type of frame or picture is a frame which is encoded/compressed using a motion compensated prediction from a past and future reference frame, or a past, or a future reference frame of video data. A reference frame may be either an I-type frame or a P-type frame. B-type frames are usually inserted between I-type, P-type, or combinations or either when there is fast motion within an image across frames. Motion compensation refers to using motion vectors from one frame to the next to improve the efficiency of predicting pixel values for encoding/compression and decoding/decompression. The method of prediction uses the motion vectors to provide offset values and error data which refer to a past or a future frame of video data having decoded pixel values which may be used with the error data to compress/encode or decompress/decode a given frame of video data. Because the amount of data required to display an image which may display motion and have varying resolutions and frame rates is greater than the amount of data required reproduce audio sounds, the video data packets such as 303-305 occur more frequently within the MPEG data stream than audio data packets such as 311. The infrequent interspersion of audio data packets between the video data packets may cause an image frame to be displayed before or after the audio has been reproduced. Time stamps are provided within the encoded/compressed data stream to facilitate the synchronization of audio and video. The video presentation time stamps 300-302 are provided at various intervals 306-308 of a given system time clock 316. The audio presentation time stamps exemplified by 310 are also provided at various intervals 312 of the MPEG data stream. Additionally, there is a system clock reference (SCR) 317 provided at various intervals 318. Each of these SCR, VPTS, and APTS are 33 bit values representing a time value. The MPEG standard recommends that the MPEG decoder use the 33-bit VPTS as the starting time of the video display sequence and the 33-bit APTS for the starting time of the audio playback sequence. The APTS and VPTS may jointly be referred to as presentation time stamps (PTS). The MPEG standard requires that a APTS, VPTS, and SCR show up in the bitstream at least once every seven tenths (0.7) of a second.
In the prior art, the 33-bit system clock reference (SCR) has been used as the reference time for both video and audio display to minimize the deviation between video and audio playback. The SCR was loaded into a counter, referred to as the system counter, and incremented by a 90 kilohertz system clock (SCLK). The output of the system counter was compared with the VPTS within the video decoder 111 and the APTS within the audio decoder 112 to determine by how much the audio or video playback was out-of-sync. If a threshold level was reached, the video would jump to be correctly in sync with the audio. Thus, the SCR is used to resynchronize the video playback with the audio playback. In some decoding systems, a video clock or decoding clock is generated without reference to the SCR and it is not locked or corrected such that a time drift (lead or lag) may appear in the synthesized VPTS derived from the video or decoding clock. This time drift may cause in one second, or 90000 system clock cycles, time errors on the order of 50 parts per million. This is equivalent to the synthesized VPTS values differing from actual VPTS values by 44 to 67 usecs. In systems which do not correct for out-of-sync conditions, the time error may accumulate and cause the video image to lead or lag the audio playback by 1 frame every 5 to 6 minutes. The frame lead or lag may also accumulate over larger periods of time if the video display and the audio playback are not occasionally resynchronized.