A multimedia application brings together audio, video and graphics information in a single presentation. Often this requires the use of a video driver, an audio driver and a graphics driver, each of which, potentially, operates according to its own individual clock generator. Although the individual clock generators are ostensibly operating at the same frequency, imperfections in the various clock generators can lead to one clock generator running faster or slower than another clock generator.
For example, for the Indeo.RTM. integrated multimedia controller, sold by Intel Corporation.TM., located in Santa Clara, Calif., the video driver operates according to the system clock while the audio driver is potentially driven by its own clock signal. In the Indeo.RTM. controller, the user may install any one of a number of commercially available sound cards. These sound cards typically contain clock generators that are of a much lower quality than the system clock used by the video driver. Consequently, the sound cards clock generator can be off by as much as twelve percent from the system clock.
One prior art method for addressing this problem requires synchronizing the video track to the audio track by discarding video information when the audio track leads the video track. When the audio track is behind the video track, the presentation of the next frame of video data is delayed. This method leads to discontinuities in the video images presented to the audience, which the audience perceived as jerkiness in the video track. Another disadvantage of synchronizing the video track to the audio track is that hundreds of thousands of bytes of video data may be discarded to synchronize the video track to the audio track.
The discontinuities and data loss resulting from synchronizing the video track to the audio track are tolerated in large part due to the belief that synchronizing the audio track to the video track will result in undesirable and intolerable harmonics and shifts in pitch. This is because the audio data samples of the audio track are in the time domain, not the frequency domain. It is believed that it would be necessary to transform the data from the time domain into the frequency domain in order to accurately process the audio data. This would result in additional overhead in both time and hardware, which is undesirable.
As will be described, the methods of the present invention synchronize the audio track to the video track while reducing harmonic dissonance and pitch shift. This is done without shifting the time-base audio samples to the frequency domain for processing. The methods of the present invention further result in a reduction in the amount of data that must be discarded in order to synchronize the video and audio tracks.