1. Technical Field
The present invention relates generally to an improved data processing system and in particular to a method and apparatus for processing a digital data stream. Still more particularly, the present invention relates to a method and apparatus for synchronizing audio and video data in a digital data stream.
2. Description of the Related Art
Multimedia, the presentation or transfer of information through more than one medium at any time, is a fast growing segment of the computer industry with many applications being developed, which incorporate various features of multimedia. Additionally, many businesses are using multimedia to present information to consumers. Multimedia combines different forms of media in the communication of information to a user through a data processing system, such as a personal computer. A multimedia application is an application that uses different forms of communications within a single application. For example, multimedia applications may communicate data to a user through a computer via audio and video simultaneously. Such multimedia applications are usually bit intensive, real time, and very demanding, requiring ample processing power in the data processing system. Users may access in the multimedia, for example, in the form of video games or movies on a digital video disk (DVD) or through a communications link.
Multimedia presentations often require synchronization of audio and video data within a multimedia data stream. In digital audio/video systems, audio and video decoders require timing information. Where the audio and video streams are compressed, the decoder decompresses them and clocks each frame out to the next stage for playback using the timing information. If the streams are uncompressed, the decoders simply use the timing information to control audio and video buffers and send the frames to the next stage at the appropriate rate. In any case, the decoders in a data processing system must maintain synchronization between the audio and video to insure a user perceives a synchronized audio/video presentation.
One well-known standard for synchronized recording and playback and compressed digital audio and video data streams is the MPEG (Motion Picture Experts Group) standard.
Video compression and encoding is typically performed by a video encoder. The video encoder normally implements a selected data compression algorithm that conforms to a recognized standard or specification agreed to among the senders and receivers of digital video signals. One such emerging standard developed by the Moving Pictures Experts Group, is generally referred to as the MPEG International Standard ISO for MPEG-1. The MPEG-1 standard defines a format for compressed digital video which supports data rates of about 1 to 1.8 Mbps (Megabits per second), resolutions of about 352 pixels (picture elements) horizontally to about 228 lines vertically, picture rates of about 24 to 30 pictures per second.
In order for a video signal to be compressed in MPEG-1, it is typically sampled, digitized, and represented by luminance and color difference signals. The MPEG standard signal is sampled with respect to color difference signals by a ratio of two-to-one (2:1). That is, for every two samples of the Y component, there is to be one sub-sample each of the Cr and Cb components. It has been determined that the 2:1 sampling ratio is appropriate because the human eye is much more sensitive to luminance (brightness) components (y) than to color components (Cr, Cb). Video sampling takes place in both the vertical and horizontal directions.
Once the video signal is sampled, it is reformatted, for example, into a non-interlaced signal. An interlaced signal is one that contains only part of the picture content (i.e. every other horizontal line) for each complete display scan. A non-interlaced signal, in contrast, is one that contains all of the picture content. After the video signal is sampled and reformatted, the encoder may process it further by converting it to a different resolution in accordance with the image area to be displayed. In doing so, the encoder must determine which type of picture is to be encoded. A picture may be considered as corresponding to a single frame of motion video, or to a frame of movie film. However, different types of picture types may be employed for digital video transmission. The picture types for MPEG video are: I-Pictures (Intra-Coded Pictures) which are coded without reference to any other pictures and are often referred to as anchor frames; P-Pictures (Predictive-Coded Pictures) which are coded using motion-compensated prediction from the past I or P reference picture, and may also be considered anchor frames; and B Pictures (Bi-directionally Predictive-Coded Pictures) which are coded using motion compensation from a previous and a future I or P Picture.
A typical coding scheme may employ a mixture of I, P, and B Pictures. Typically, an I Picture may occur every half a second, with two B Pictures inserted between each pair of I or P pictures. I Pictures provide random access points within the coded sequence of pictures where decoding can begin, and are coded with only a moderate degree of compression. P Pictures are coded more efficiently using motion compensated prediction from a past I or P Picture and are generally used as a reference for further prediction. B Pictures provide the highest degree of compression but require both past and future reference pictures for motion compensation. B Pictures are generally not used as references for prediction. The organization of the three picture types in a particular video sequence is very flexible.
MPEG video decoding is the inverse of MPEG video encoding and is employed to reconstruct a motion picture sequence from a compressed, encoded bitstream. The data in the bitstream is decoded according to the syntax defined in the data compression standard. The decoder must first identify the beginning of a coded picture, identify the type of picture, then decode each individual macroblock within a particular picture. If there are motion vectors and macroblock types (each of the picture types I, P, and B have their own macroblock types) present in the bitstream, they can be used to construct a prediction of the current macroblock based on past and future reference pictures that the decoder has already stored. Coefficient data is then inverse quantized and operated on by an inverse discrete cosine transform (IDCT) process that transforms the macroblock data from the frequency domain to the space domain.
Once all the macroblocks have been processed by the decoder, the picture reconstruction is complete. If the picture just reconstructed is a reference picture (I Picture), it replaces the oldest stored reference picture and is used as the new reference for subsequent pictures. As noted above the pictures may also need to be re-ordered before they are displayed in accordance with their display order instead of their coding order. After the pictures are re-ordered, they may be displayed on an appropriate output device.
For software MPEG playback, MPEG data contains presentation time stamps (PTS) which are meant to be used for synchronization. However, with software playback both audio and video data after decompression sometimes lose the exact position of the PTS. As a result, the PTS values cannot be accurately used for synchronization in some cases. Additionally, because MPEG frames consist of I, P, and B frames, how the video frames are dropped for synchronization is a determinant of video quality on playback.
Therefore, it would be advantageous to have an improved method for synchronizing audio and visual data without requiring the use of time stamps.
It is one objective of the present invention to provide an improved data processing system.
It is another objective of the present invention to provide a improved method and apparatus for processing a digital data stream.
It is yet another objective of the present invention to provide a method and apparatus for synchronizing audio and video data in a digital data stream.
The foregoing objectives are achieved as follows.
The present invention provides a method and apparatus for synchronizing a data stream that contains video and audio data in which the video data includes frames. As frames are being processed, a number of frames processed is identified along with a number of frames dropped during processing. A synchronization speed is identified using the number of frames processed and the number of frames dropped. A frame is selectively dropped from the data stream based on the synchronization speed.
The above as well as additional objectives, features, and advantages of the present invention will become apparent in the following detailed written description.