Audio and video data has been used to supplement the capabilities of many known systems. For example, video data has been used to supplement the capability of the telephone to produce a multimedia system which may be referred to as a teleconferencing system. The teleconferencing system provides video data to supplement the audio data. The inclusion of video data in a multimedia system, however, may increase the likelihood of data bandwidth problems. Specifically, the size of the audio and video data may make it difficult for the system to transmit all of the required data in the necessary time. Consequently, some multimedia systems utilize a form of encoding and decoding of the audio and video data to reduce the required bandwidth. Several multimedia specification committees have established and proposed standards for encoding and decoding audio and video information. MPEG1 and MPEG2 (the MPEG standard), established by the Motion Picture Experts Group, are examples of such standards.
A system using the MPEG standard may compress real time audio and video data for transmission over a network where the audio and video may be decompressed and reproduced. The system may compress each video frame to reduce the amount of data required to reproduce the equivalent frame for display. Video frames may be compressed in three ways according to the MPEG standards. An intra or I-type frame may comprise a frame of video data coded using information about itself. One given non-compressed video frame may be encoded into one I-type frame of encoded video data. A predictive or P-type frame may comprise a frame of video data encoded using motion compensated prediction from a past frame. A previously encoded frame such as I-type or P-type may be used to encode a current non-compressed frame of video data into a P-type frame of encoded video data. A bi-directional or B-type frame may comprise a frame of video data encoded using a motion compensated prediction from a past and future reference frame, or a past, or a future reference frame of video data. A reference frame may be either an I-type frame or a P-type frame. If a reference frame were to be lost or discarded, the decoder may slow down when decompressing subsequent video frames which were compressed with reference to the discarded frame.
FIG. 1A is a block diagram of a realtime multimedia system using encoding and decoding. The encoder 100 may accept analog audio and video information to produce a multimedia data stream which may be transmitted across a connection 120. The decoder 150 may then decode the multimedia data stream for presentation to the end user.
The analog audio and video information may be processed by encoder 100 using video encoder 105 and audio encoder 110. The video encoder 105 and the audio encoder 110 compress the video and audio information according to the MPEG1 or MPEG2 standard. A specified frame rate may be used to predict the arrival time of each video frame from the real time video source. For example, if the specified frame were 30 frames per second, a prediction may be made that a video frame will arrive approximately every 33 ms. An arriving video frame may then be stamped with the time which corresponds to the predicted arrival time for the that frame. The time stamp may be associated with the encoded video information (a video presentation time stamp or VPTS) and the audio information (an audio presentation time stamp).
More data may be required to display an image than to generate accompanying audio as an image may have varying resolutions and include motion and frame rates not included in the audio data. Thus, video data may occur more frequently within the MPEG data stream. The infrequent interspersion of audio data between video data may cause an image frame to be displayed before or after the audio has been reproduced. The time stamps may be used to synchronize the presentation of audio and video at the decoder 150.
In general, the audio and video data may be synchronized by the decoder 150 by comparing the audio and video presentation time stamps with the system clock reference (SCR) 125. The SCR 125 may indicate how much time has elapsed since the start of the multimedia data stream. Ideally, the time stamps within the multimedia data stream correspond to the SCR 125. By comparing the SCR 125 with the presentation time stamps associated with audio and video data, the decoder 150 may determine whether the video and audio data are leading or lagging SCR 125. The decoder 150 may then compensate for the lead or lag. For example, if the decoder 150 determines that the presentation time stamps are lagging the SCR 125, the decoder 150 may discard the data associated with the lagging presentation time stamps.
FIG. 1B illustrates an exemplary arrangement of video frames, corresponding SCR times and the presentation time stamps associated with the video frames. SCR times 33 ms through 166 ms correspond to the arrival times for video frames provided at a particular theoretical frame rate. Frame numbers #1 through #5 correspond to the encoding time of video frames provided by the real time video source. PTS times 33 ms through 166 ms indicate the time stamp on the corresponding video frame. For example, video frame #2 is encoded at SCR time 66 ms and is stamped with a PTS of 66 ms.
As multimedia systems proliferate consumer electronic devices may be used to supply audio and video input. For example, a multimedia system may include a video cassette recorder (VCR) to supply realtime video input. The encoder 100 could then use the frame rate specified in the MPEG header to time stamp the associated video and audio data. There may, however, be problems associated with the use of various consumer electronic devices. For example, some consumer electronic devices may produce video frames at a rate which is not in accord with the specified rate. Similarly, the video output provided by the consumer electronic device may not be in accord with known standards associated with video signals. In other words, the specified video frame rate may be inaccurate, causing video frames to be stamped inaccurately. Moreover, many consumer electronic devices may produce a comparatively low quality video output. For example, some spurious signals may occur in the video supplied by the consumer electronic device which may lead to an inaccurate frame rate.
FIGS. 2 and 3 illustrate how video frames may be stamped with inaccurate time stamps when the specified frame rate is incorrect. FIG. 2 illustrates the time stamps associated with encoded video frames when a real time video source provides video frames faster than the specified rate. A faster than specified frame rate may cause video frames to be time stamped with times which do not correspond to SCR 125. For example, frame #1 is encoded and stamped with a PTS of 33 ms. Frame #2, which according to the specified video frame rate should be encoded at approximately 66 ms, is encoded some time earlier. Despite the actual encoding time, frame #2 is stamped with a PTS of 66 ms. Frame #3, which according to the specified video frame rate should be encoded at approximately 100 ms, is encoded some time earlier and time stamped with a PTS of 100 ms. Frame #4, which according to the specified video frame rate should be encoded at approximately 133 ms, is stamped with a PTS of 133 ms. Frame #5, which according to the specified video frame rate should be encoded at approximately 166 ms, is encoded at an SCR time of 133 ms and stamped with a PTS of 166 ms. In other words, frame #5 should have been encoded at approximately an SCR time of 166 ms because the specified frame rate indicates that the fifth video frame should be encoded at that time. Frame 45, however, is actually encoded at an SCR time of 133 ms (i.e., a full frame time earlier than it should have been encoded). Despite the actual time, frame #5 is stamped with a PTS of 166 ms (i.e., an inaccurate time stamp).
The situation described above may produce problems in decoding the actual video frames. When the video frames are produced at a rate which is faster than that specified, the decoder's 150 input buffer may be completely filled. The decoder 150 may then discard a number of frames to catch up which may cause the display to appear discontinuous. Moreover, if any of the discarded frames are reference frames, decoding subsequent frames may be delayed which may cause additional display artifacts.
FIG. 3 shows the time stamps associates with video frames when the video frames are provided at a frame rate which is slower than the specified rate. SCR times 33 ms through 200 ms represent the time stamps normally associated with video frames provided at the specified rate. PTS times 33 ms through 166 ms represent the time stamps associated with encoded video frames. Frame #1 though frame #5 represent the actual encoding time for the corresponding frame. If the video frames are provided less frequently than the specified rate, the decoder 150 may experience difficulty in decoding the actual video frames. For example, frame #1 is encoded at approximately SCR time 33 ms and stamped with a PTS of 33 ms. Frame #2 is encoded some time after SCR time 66 ms yet stamped with a PTS of 66 ms. Because of the difference between the actual frame rate and the specified frame rate, as time passes the actual frames arriving increasingly later than the theoretical time. For example, frame #5 is encoded at approximately SCR time 200 ms, the frame is stamped with a PTS of 166 ms. The PTS is for frame #5 is therefore inaccurate.
The inaccurate PTS, may cause difficulty for the decoder 150. For example, the decoder 150 may discard frame #4 because the PTS of 133 ms indicates that the time for displaying frame #4 has passed in relation to SCR time. In other words, frame #4 is stamped with a PTS which indicates that the time for displaying frame #4 has passed. Consequently, the decoder 150 may discard frame #4 and resort to displaying frame #3. When frame #5 is decoded, the PTS of 166 ms indicates that frame #5 is too late as well and may be discarded. Again the decoder 150 may resort to displaying frame #3. This type of behavior may result in detectable visual artifacts such as repeating and skipping subsequent video frames.