I. Field of the Invention
The present invention relates generally to audio and video synchronization, and more specifically to a system and method for audio and video synchronization using audio signatures.
II. Description of the Related Art
Encoded digital video streams are used in a variety of applications that allow videos to be distributed on a variety of media. For example, movies are commonly stored on Digital Video Disc (DVD).
Several encoding standards, such as MPEG, MPEG-2, MPEG-4, AVI, QuickTime, etc. have been developed for encoding and distribution of digital video streams. Different standards allow for varying degrees of functionality versus storage requirements. For example, MPEG-2 is primarily designed for encoding movies and other audio-visual works. Similarly, MPEG-4 is designed to handle video streams transmitted over low bandwidth communication channels.
The implementation of encoded digital video and audio stream presents numerous design challenges that were not encountered in the prior processing of analog audio and video signals. For example, with digital signal processing, the audio signals are separated from the video signals; and the audio and video are processed independently. However, the playback of the audio and video must be synchronized, so that there is a coordinated and coherent reproduction of the desired audio and video provided by the source of the program material.
For example, the program source may provide the audio and video data in respective data packets in an “MPEG-2” format. Each of the audio and video data packets is received from the source of video material in a continuous data stream. Each packet of video data includes a header block followed by a data block. The data block may include any number, for example one to twenty, of frames of video data that may include a full field of video data or be a coded group of pictures that includes its own header block identifying the picture type and display order. The header block for a video data packet includes control information, for example, the identity of the format of the video data, the type of compression, if used, picture size, display order, and other global parameters. The audio data packet has a header block that again identifies the format of the audio data with instructions relating to how the audio data is to be decoded and processed to provide desired enhancements, if applicable. Following the header block, the audio data packet includes an audio data block that has any number of blocks or frames of audio data, for example, from one to approximately twenty blocks.
Selected ones of the header blocks of the audio and video data packets include a presentation time stamp (“PTS”) value which is a time stamp that is applicable to that data packet. The PTS value is a time reference to a system time clock that was running during the creation or recording of the audio and video data. A similar system time clock is also running during the playback of the audio and video data, and if the audio and video data are played back at the times represented by their presentation time stamps, the audio and video data will be presented to the user in the desired synchronized manner. Therefore, the PTS is used to synchronize the presentation or playback of the audio and video data.
The decoding of the video data normally requires decompression, conversion of partial frames into full frames and the recognition of full frames. Simultaneously with the decoding process, the frames of audio and video data are being output, that is, played back to the user; and that playback must be synchronized such that the frames of audio and video present a coordinated and coherent presentation.
A time stamp is included in each frame of an encoded video stream representing a desired playback time for each frame. Digital audio is frequently processed or encoded in blocks of digital samples that must be processed as a block. The decoder, in turn, examines the time stamps of each frame to determine whether the timing relationship among the frames in the encoded video stream is preserved during playback or whether playback timing must be adjusted to compensate for variations in the decoding/display process.
However, in doing audio and video capture, if the audio capture device uses a different clock from the video capture device, it is possible that the video frames might not synchronize with the relative audio samples because of the time shift between the two reference clocks. Moreover, frames may not synchronize if the audio or video data are lost due to buffer overflow. In addition, some devices or applications use the count embedded inside the audio stream as the basis for synchronizing audio and video. The quality of the audio/video synchronization of the resulting stream (e.g., AVI, MPEG, WMV, etc.) will be very poor if audio samples are dropped or the audio sample rate is not equal to the video sample rate. Accordingly, what is needed is a system and method for synchronizing audio and video stream, which will overcome the above limitations.