The present invention relates to the field of multimedia systems. More particularly, the invention relates to the encoding of video and audio data.
Audio and video data has been used to supplement the capabilities of many known systems. For example, video data has been used to supplement the capability of the telephone to produce a multimedia system which may be referred to as a teleconferencing system. The teleconferencing system provides video data to supplement the audio data. The inclusion of video data in a multimedia system, however, may increase the likelihood of data bandwidth problems. Specifically, the size of the audio and video data may make it difficult for the system to transmit all of the required data in the necessary time. Consequently, some multimedia systems utilize a form of encoding and decoding of the audio and video data to reduce the required bandwidth. Several multimedia specification committees have established and proposed standards for encoding and decoding audio and video information. MPEG1 and MPEG2 (the MPEG standard), established by the Motion Picture Experts Group, are examples of such standards.
A system using the MPEG standard may compress real time audio and video data for transmission over a network where the audio and video may be decompressed and reproduced. The system may compress each video frame to reduce the amount of data required to reproduce the equivalent frame for display. Video frames may be compressed in three ways according to the MPEG standards. An intra or I-type frame may comprise a frame of video data coded using information about itself. One given non-compressed video frame may be encoded into one I-type frame of encoded video data. A predictive or P-type frame may comprise a frame of video data encoded using motion compensated prediction from a past frame. A previously encoded frame such as I-type or P-type may be used to encode a current non-compressed frame of video data into a P-type frame of encoded video data. A bi-directional or B-type frame may comprise a frame of video data encoded using a motion compensated prediction from a past and future reference frame, or a past, or a future reference frame of video data. A reference frame may be either an I-type frame or a P-type frame. If a reference frame were to be lost or discarded, the decoder may slow down when decompressing subsequent video frames which were compressed with reference to the discarded frame.
FIG. 1A is a block diagram of a realtime multimedia system using encoding and decoding. The encoder 100 may accept analog audio and video information to produce a multimedia data stream which may be transmitted across a connection 120. The decoder 150 may then decode the multimedia data stream for presentation to the end user.
The analog audio and video information may be processed by encoder 100 using video encoder 105 and audio encoder 110. The video encoder 105 and the audio encoder 110 compress the video and audio information according to the MPEG1 or MPEG2 standard. A specified frame rate may be used to predict the arrival time of each video frame from the real time video source. For example, if the specified frame were 30 frames per second, a prediction may be made that a video frame will arrive approximately every 33 ms. An arriving video frame may then be stamped with the time which corresponds to the predicted arrival time for the that frame. The time stamp may be associated with the encoded video information (a video presentation time stamp or VPTS) and the audio information (an audio presentation time stamp).
More data may be required to display an image than to generate accompanying audio as an image may have varying resolutions and include motion and frame rates not included in the audio data. Thus, video data may occur more frequently within the MPEG data stream. The infrequent interspersion of audio data between video data may cause an image frame to be displayed before or after the audio has been reproduced. The time stamps may be used to synchronize the presentation of audio and video at the decoder 150.
In general, the audio and video data may be synchronized by the decoder 150 by comparing the audio and video presentation time stamps with the system clock reference (SCR) 125. The SCR 125 may indicate how much time has elapsed since the start of the multimedia data stream. Ideally, the time stamps within the multimedia data stream correspond to the SCR 125. By comparing the SCR 125 with the presentation time stamps associated with audio and video data, the decoder 150 may determine whether the video and audio data are leading or lagging SCR 125. The decoder 150 may then compensate for the lead or lag. For example, if the decoder 150 determines that the presentation time stamps are lagging the SCR 125, the decoder 150 may discard the data associated with the lagging presentation time stamps.
FIG. 1B illustrates an exemplary arrangement of video frames, corresponding SCR times and the presentation time stamps associated with the video frames. SCR times 33 ms through 166 ms correspond to the arrival times for video frames provided at a particular theoretical frame rate. Frame numbers #1 through #5 correspond to the encoding time of video frames provided by the real time video source. PTS times 33 ms through 166 ms indicate the time stamp on the corresponding video frame. For example, video frame #2 is encoded at SCR time 66 ms and is stamped with a PTS of 66 ms.
As multimedia systems proliferate consumer electronic devices may be used to supply audio and video input. For example, a multimedia system may include a video cassette recorder (VCR) to supply realtime video input. The encoder 100 could then use the frame rate specified in the MPEG header to time stamp the associated video and audio data. There may, however, be problems associated with the use of various consumer electronic devices. For example, some consumer electronic devices may produce video frames at a rate which is not in accord with the specified rate, Similarly, the video output provided by the consumer electronic device may not be in accord with known standards associated with video signals. In other words, the specified video frame rate may be inaccurate, causing video frames to be stamped inaccurately. Moreover, many consumer electronic devices may produce a comparatively low quality video output. For example, some spurious signals may occur in the video supplied by the consumer electronic device which may lead to an inaccurate frame rate.
FIGS. 2 and 3 illustrate how video frames may be stamped with inaccurate time stamps when the specified frame rate is incorrect. FIG. 2 illustrates the time stamps associated with encoded video frames when a real time video source provides video frames faster than the specified rate. A faster than specified frame rate may cause video frames to be time stamped with times which do not correspond to SCR 125. For example, frame #1 is encoded and stamped with a PTS of 33 ms. Frame #2, which according to the specified video frame rate should be encoded at approximately 66 ms, is encoded some time earlier. Despite the actual encoding time, frame #2 is stamped with a PTS of 66 ms. Frame #3, which according to the specified video frame rate should be encoded at approximately 100 ms, is encoded some time earlier and time stamped with a PTS of 100 ms. Frame #4, which according to the specified video frame rate should be encoded at approximately 133 ms, is stamped with a PTS of 133 ms. Frame #5, which according to the specified video frame rate should be encoded at approximately 166 ms, is encoded at an SCR time of 133 ms and stamped with a PTS of 166 ms. In other words, frame #5 should have been encoded at approximately an SCR time of 166 ms because the specified frame rate indicates that the fifth video frame should be encoded at that time. Frame #5, however, is actually encoded at an SCR time of 133 ms (i.e., a full frame time earlier than it should have been encoded). Despite the actual time, frame #5 is stamped with a PTS of 166 ms (i.e., an inaccurate time stamp).
The situation described above may produce problems in decoding the actual video frames. When the video frames are produced at a rate which is faster than that specified, the decoder""s 150 input buffer may be completely filled. The decoder 150 may then discard a number of frames to catch up which may cause the display to appear discontinuous. Moreover, if any of the discarded frames are reference frames, decoding subsequent frames may be delayed which may cause additional display artifacts.
FIG. 3 shows the time stamps associates with video frames when the video frames are provided at a frame rate which is slower than the specified rate. SCR times 33 ms through 200 ms represent the time stamps normally associated with video frames provided at the specified rate. PTS times 33 ms through 166 ms represent the time stamps associated with encoded video frames. Frame #1 though frame #5 represent the actual encoding time for the corresponding frame. If the video frames are provided less frequently than the specified rate, the decoder 150 may experience difficulty in decoding the actual video frames. For example, frame #1 is encoded at approximately SCR time 33 ms and stamped with a PTS of 33 ms. Frame #2 is encoded some time after SCR time 66 ms yet stamped with a PTS of 66 ms. Because of the difference between the actual frame rate and the specified frame rate, as time passes the actual frames arriving increasingly later than the theoretical time. For example, frame #5 is encoded at approximately SCR time 200 ms, the frame is stamped with a PTS of 166 ms. The PTS is for frame #5 is therefore inaccurate.
The inaccurate PTS, may cause difficulty for the decoder 150. For example, the decoder 150 may discard frame #4 because the PTS of 133 ms indicates that the time for displaying frame #4 has passed in relation to SCR time. In other words, frame #4 is stamped with a PTS which indicates that the time for displaying frame #4 has passed. Consequently, the decoder 150 may discard frame #4 and resort to displaying frame #3. When frame #5 is decoded, the PTS of 166 ms indicates that frame #5 is too late as well and may be discarded. Again the decoder 150 may resort to displaying frame #3. This type of behavior may result in detectable visual artifacts such as repeating and skipping subsequent video frames.
In view of the above discussion, it is an object of the present invention to reduce visual artifacts in the decoded video when video and audio data are supplied in real time.
It is another object of the present invention to reduce the cost of a system for encoding and decoding, audio and video.
It is yet another object of the present invention to produce an output which may be decoded by a standard decoder.
These and other objects of the present invention are provided by determining a time stamp which compensates for a difference between a video frame rate corresponding to a video frame within the real time multimedia data stream and the oscillator clock. The video frame is stamped with a compensating time stamp which compensates for the difference between the theoretical presentation time stamp corresponding to the video frame and the oscillator clock.
The compensating time stamp may be generated by comparing the oscillator clock to the theoretical PTS. If the difference exceeds a first predetermined threshold value the compensation may take place in a coarse adjustment mode. If the difference is less than or equal to a second predetermined threshold the compensation may be made using a fine adjustment mode. Once the compensation is determined, the video frame may then be time stamped with the adjusted PTS.
Fine adjustment may be accomplished by multiplying the theoretical presentation time stamp by a first compensation factor and multiplying the oscillator clock by a second compensation factor. The two products may then be added to provide the adjusted PTS.
Coarse adjustment may be accomplished by incrementing the theoretical PTS and comparing the incremented value with the present value of the oscillator clock. If the comparison indicates that the difference exceeds a third predetermined threshold value, time is added or subtracted to/from the theoretical PTS. Alternatively, a coarse mode adjustment may be made by adjusting the oscillator clock so as to be equal to the value of the closest in time increment of the theoretical PTS.
Consumer electronic devices may provide video and audio data at a rate which does not correspond to the rate specified in a header associated with the multimedia data stream. The decoder may expect video frames to be delivered at the specified rate, yet the actual video rate may vary. This may result in unpleasant visual artifacts in the decoded video. Visual artifacts may be lessened by reducing the number of encoded video frames that may be discarded by the decoder. The MPEG standard allows for certain frames to be encoded with reference to other encoded frames. This means that some frames may be decoded by using previously decoded frames. If, however, the previous video frames are discarded by the decoder the video frames may appear distorted or may be skipped. The present invention may reduce the number of frames that may be discarded by adjusting the PTS so that fewer video frames are time stamped such that the video frames are discarded by the decoder.
System cost may be reduced by reducing the need for special interface hardware to compensate for the inaccurate frame rates produced by some consumer electronic devices. The present invention may address the inaccuracy by providing an adjusted PTS which is less dependent upon the actual frame rate of the video from the consumer electronic device.
A standard encoded output may be provided by complying with the required encoding (i.e., MPEG 1 or MPEG 2). The present invention may provide a multimedia data stream which complies with the encoding standard and, therefore, may allow the system to utilize a decoder which complies with the encoding standard. For example, the system may employ an encoder which utilizes the present invention and a decoder which complies with the MPEG1 or MPEG2 encoding standard. Consequently, the decoder may not be required to understand how the encoder provides the multimedia data stream.
As will further be appreciated by those of skill in the art, the present invention may be embodied as a methods, apparatus/systems or computer program products.