1. Field of the Invention
The present invention relates to the field of information presentation in a data processing system. More particularly, the present invention relates to a method and apparatus for synchronizing the presentation of media (i.e., time dependent data) to an arbitrary time reference in a data processing system.
2. Art Background
Computer systems and more generally data processing systems that utilize time dependent data, such as audio data or video data, to produce a time dependent presentation require a synchronization mechanism to synchronize the processing and display of time dependent data. Without synchronization, time dependent data cannot be retrieved, processed, and used at the appropriate time. As a result, the time dependent presentation would have a discontinuity due to the unavailability of the data. In the case of video data, there is typically a sequence of images ("frames") which, when the images are displayed in rapid sequence (e.g., each frame being displayed for 1/30th of a second immediately after displaying a prior frame), creates the impression of a motion picture. The human eye is particularly sensitive to discontinuities in a motion picture. With the video images, the discontinuity would manifest itself as a stutter in the video image or a freeze-up (e.g., a single frame being displayed considerably longer than 1/30th of a second) in the video image. With audio presentations, this discontinuity would manifest itself as a period of silence, a pause, clicks, or stuttering.
FIG. 1A illustrates one prior technique of synchronizing digital audio information (i.e. samples) with digital video information (i.e. samples) is to make the audio clock the master clock and to slave the digital video information to this audio clock. This approach is employed by Microsoft in Windows 3.1, Video for Windows, and by IBM in MMPM for OS/2. In this approach, the audio dock of the sound subsystem (e.g., a Sound Blaster card) is the master clock to which digital audio samples and digital video samples are synchronized. This audio clock determines the rate at which the digital audio samples are sent to a digital to analog converter (DAC), and subsequently, played back through a transducer (e.g., speaker). This audio clock is also used to synchronize the digital video information (i.e., the timing of the frames of video).
Specifically, this approach employs a buffer 2 for receiving audio samples 4 and providing a sample count 6. The sample count 6 is simply the number of samples sent from the buffer 2 to the digital to analog D/A converter 8. The D/A converter 8 receives the samples from the buffer 2 and provides the samples to a transducer 12 (e.g., a speaker). The D/A converter 8 has an input for receiving an audio clock 10.
A graphics controller 14 that processes a video stream of samples is coupled to the buffer 2. The graphics controller 14 has a first input for receiving video samples 16, and a second input for receiving the sample count 6 from buffer 2. The graphics controller 14 provides the video samples 16 to a display 18 at a rate determined indirectly, by determining the sample count (i.e., (.DELTA.T.times.Rate) modulo 2048).
Synchronizing both the digital audio and video information to the audio clock has several disadvantages. First, because the audio clock is typically generated by a low quality, low precision oscillator, the audio clock does not stay in synchronization with real time. Being out of synchronization with real time causes two distinct problems. First, with respect to audio information, being out of synchronization with real time causes the pitch of a sound to be different from the desired and true frequency or tone. Second, in an environment where one workstation is sending time dependent information to another workstation, video and audio continuity and quality are compromised when the video and audio information are not synchronized to a high-quality, high precision clocks. For example, workstation A may be sending samples at a first frequency (rate of transmission) dependent on the audio clock, disposed in workstation A, and workstation B may be presenting these samples at a rate of reception dependent on the audio clock of workstation B. If the frequency of the audio clock of workstation A is different by any amount from the audio clock of workstation B, a delay or overload of information occurs causing buffering issues (too much information) or static/noise (too little information).
Moreover, a second disadvantage of utilizing the audio clock as the master dock for synchronization purposes, is that one cannot directly read the audio clock in the sound subsystem. For example, in the Soundblaster card, although one can observe how many samples are currently queued in the FIFO (i.e., a FIFO buffer flag indicates the number of samples in the buffer), one is not provided the number of samples played through the D/A converter to the speakers. When one cannot read the audio clock and when one is only provided with the status of the FIFO buffer, an error is injected into the system with respect to the actual number of samples played already (e.g. resolution of the data). For example, if the data is at 22KHz, stereo, and the buffer is 16 bits in size, 2K over 88K equals approximately 1/40th of a second. This is the granularity or error of estimating how many samples have been played through the DAC to the speakers by using a sample count from the buffer (i.e., an approximation error of the exact audio playback time). Because the video information is slaved to the audio sample count, this error affects the synchronization of the video samples, as well.
The Windows 3.1, Video for Windows, approach is not interrupt driven and essentially performs the following sequence of steps. First, the program reads an audio count (i.e., the audio frame number). Next, it determines if the audio count is far along enough for the next video frame. If so, it processes or draws the next video frame. In this prior art scheme, the video information always comes at times, approximate to the audio information and so is properly synchronized with the audio information. However, at lower audio rates, the video information is jerky (e.g., a stuttering video) due to the error in the sample count.
FIG. 1B illustrates a graph of the audio clock versus the real time clock for the prior art approach discussed above. It is clear that the prior art demonstrates a low resolution timebase and drift with respect to real time. For example, compare the Real Time line with the Implied Time line. The height of each step of the Implied Time line is simply the size of the FIFO buffer. For example, with a 2K FIFO buffer, the granularity may be calculated using the expression (number of bytes-(number of bytes modulo 2048)).
FIG. 2A illustrates an alternative prior art approach to synchronizing digital audio samples with digital video samples. The improvement of this approach over the approach, illustrated in FIG. 1A is a reduction of the 2K FIFO buffer granularity discussed previously. This approach employs a sample count estimation unit 24 for reducing the granularity. The sample count estimation unit 24 has a first input for receiving the sample count 6 from buffer 2, a second input for receiving a dock 26 that has a greater precision than the audio clock 10, and a third input for receiving a programmed audio rate. For example, clock 26 may be the real time clock of the computer system. The sample count estimation unit 24 based on these inputs, generates an estimated sample count 28 and provides this count 28 to the graphics controller 14.
The sample count estimation unit 24 is responsible for performing the following. First, the sample count 6 is recorded at buffer 2 with a high precision time (provided by the second clock 26) to provide a baseline for the subsequent measurements. Second, the time elapsed since the buffer 2 was last filled is calculated. Third, this elapsed time is multiplied by the sample rate of the audio dock 10 (interpolation step). The result of this calculation is added to the number of samples recorded in the baseline record. Accordingly, the present invention provides an estimate of the sample count played through the D/A converter at any given time.
This prior art technique attempts to eliminate the 2K FIFO buffer granularity in the audio subsystem of the first approach by recording the time at which the FIFO buffer is filled with a sample, and the number of samples stored in the FIFO at that time. This time stamping is accomplished by another higher resolution clock (e.g., a real time system dock). This prior art approach gives the number of samples at any given time. By measuring the elapsed time since the FIFO buffer was filled and multiplying the elapsed time with the audio rate of the samples and adding the number of samples present at an initial time, this prior art technique estimates the number of samples played back through the DAC.
The following steps are employed by this prior art technique. First, a reference sample count is recorded at a first time. Then, additional audio samples are queued. Second, the sample count is recorded, along with the time via a high precision clock (time stamp). The audio information is synchronized by 1) the reading of the high precision time, subtracting the FIFO buffer time stamp, 2) multiplying this difference by the audio rate, 3) and adding the audio FIFO buffer counter to the audio reference sample count.
The term samples is used to denote one or more digital samples of data. These samples may be represented by a digital value having different sizes (e.g., bytes, words, etc.).
FIG. 2B illustrates a graph of the audio clock versus the real time dock for this second prior art approach. Although demonstrating finer granularity synchronization, it still has drift with respect to real time. For example, compare Implied Time line with Real Time line. Although better than the first prior art approach in that it reduces the granularity via interpolation and better approximates the ideal one-to-one correspondence between the audio master time and the real time, this second prior art approach has several disadvantages.
For example, this second prior art approach still synchronizes the video sample to the audio clock, which is not a high precision clock. Thus, the other problems noted above with respect to an imprecise audio clock (i.e., pitch and queuing problems) still exist in this approach.
An alternative way to synchronize video data and audio data is to insure that the audio clock and the real time clock never get out of phase (i.e., are always synchronized). For example, in Macintosh computers, although the audio information was slaved to one clock, and the video information was slaved to another clock, these two docks were very precise in that they were manufactured in such a way that they never got out of phase. Thus, this approach, controls the manufacturing process of the audio and real time clocks (RTCs) to synchronize both the audio and video information. This approach is costly and highly process dependent. For example, for this approach to be effective, generally one manufacturer must provide both audio capabilities, and the real time clock (RTC), and that manufacturer must use the high accuracy manufacturing process on both oscillator parts to ensure synchronization. Moreover, this technique is not conducive for a user to expand the audio capabilities of a computer system with audio cards from other manufacturers.
FIG. 3 illustrates yet another alternative prior art approach to synchronizing digital audio samples to digital video samples. This technique is employed by the Assignee of the present invention in Quicktime for Windows version 1.1. In this approach, a rate conversion unit 30 having a first input for receiving samples audio samples 32. The rate conversion unit 30 includes a second input for receiving a second clock rate 50, hereinafter described. The rate conversion unit 30 has an output for generating samples at a second clock rate 36. The samples at the second clock rate 36 are provided to the D/A converter 8.
The D/A converter 8 has an input for receiving a second clock rate 38, generated by a second oscillator 40 (.O slashed..sub.2) This oscillator is programmed by a program value 42.
Typically, a user may select from a limited number of frequencies and program the oscillator 40 with a desired frequency. However, the actual frequency generated by the oscillator 40 is not identical to the programmed desired frequency. Thus, the discrepancy between the desired programmed frequency value 42 and the actual second clock rate 38 is one source of synchronization errors in the system.
This prior art approach employs a database 46 that is coupled to the rate conversion unit 30 to provide a second clock rate 50 to that rate conversion unit 30. The database 46 includes an input for receiving various hardware configuration parameters 48, and an input for receiving a programmed value 42 (i.e., a user programmed value for the audio clock). The database includes a second clock rate 50 (the best approximation of the actual frequency of the audio clock) for different sets of specific hardware configuration parameters 48 and different programmed values 42. Upon initialization, a user, or computer system, provides the database with its required inputs, and the database generates a second clock rate 50 (i.e., the best approximation of the audio dock frequency). In response to these inputs, the database generates the conversion rate 50. The hardware configuration parameters 48 specify the particular hardware and software components of the computer system. The second clock rate 50 is a predicted frequency of the second clock corresponding to a particular programmed value 42, selected by the user, and the hardware configuration parameters 48. The rate conversion unit 30 uses the same second clock rate 50 thereafter during playback. The first oscillator 35 generates a first clock frequency 34, which is used to time presentation of video information and other types of media information (not shown). The first clock may be the real time clock of the computer system.
This approach has several disadvantages. First, if a database does not contain a second clock rate (.O slashed..sub.2) corresponding to a known hardware configuration (e.g., a new sound card employing a new oscillator), the rate conversion unit 30 is not necessarily provided with a second clock rate 50 that approximates the actual frequency of the second clock rate. Second, with the advent of clocks being implemented with simple resister-capacitor circuits (RC circuits) that generate frequencies susceptible to temperature fluctuations of the circuit (i.e., the frequency of these RC clocks vary as a function of temperature), this approach cannot compensate or adjust for clocks having a dependence on environmental factors.
Accordingly, a method and apparatus of synchronizing the presentation of audio, video, and other media information is desired based upon an arbitrary external reference.