1. Field of the Invention
This invention relates to the field of digital video; specifically, this invention is a method, apparatus, and system for synchronizing presentation of video data at a receiver with serving of data at a server.
2. Background
In digital video, a receiver/client can receive digital video data that is served by a server over a communication channel. Digital video data includes a video component and an audio component. The audio component has a fixed audio time interval. The video component typically has a fixed number of frames per second. The data is typically sent in a standard digital video format such as the MPEG format; however, the invention also applies to time-stamped information that is in a format other than MPEG.
The server typically has MPEG encoding capability, though this is not necessary when pre-encoded files are being served. The receiver is a client to the server. The receiver includes a video interface that is capable of decoding MPEG data. The terms “receiver”, “receiver/client”, and “receiver/decoder” all refer to the receiver.
MPEG data includes timing information, which is used to drive presentation devices where the data needs to be presented in accordance with the time stamp to provide a smooth presentation and where the time stamp is used to synchronize the audio and video presentations. Time stamps are used to indicate to a decoder/receiver when a specific event should occur. For the video component, the time stamp tells the decoder/receiver when a frame should be displayed. For the audio component, the time stamp tells the decoder the specific moment in time when a sound should be played. The amount of data necessary to provide a specific time interval of presentation, such as 1 second, can vary widely.
The receiver must process the MPEG data before that data can be used to drive a presentation device such as a monitor and/or speakers. The processing includes demultiplexing the MPEG data into an audio stream and a video stream, synchronizing the playback of the separated data streams, and converting the digital data to analog signals. Processing can be accomplished in software or hardware, although hardware is usually used because of its speed advantage.
The server serves video data in real-time; that is, the data is served at approximately the rate at which it should be presented. The server knows when it should deliver the data based on the time stamps embedded in the MPEG stream. Timestamps in MPEG are included periodically, and at a minimum once every 0.7 seconds. Time measurement at the server governs the rate at which the server serves data. (The server can send pre-encoded files or can send real-time data. In the case of a pre-encoded file, the clock of the server processor determines the rate. In the case of a real-time feed, the clock inside the encoder at the server determines the rate. In this application, the term “server clock” is used generically to indicate whatever clock is determining the serving rate of the video date.)
The receiver consumes data in real-time. If time at the server were measured exactly equal to time at the receiver, the receiver would consume data at the same rate as it is served, and presentation of the data would be smooth.
(The receiver typically buffers an amount of data prior to beginning display, thus a temporary drop in the rate of reception of the data due to interruption of the communication link or server is usually not an issue. A buffer is usually included in both the receiver processor as well as the decoder/video interface. The size of the buffer is not critical, but should be big enough so that such network jitter is not an issue.)
However, in practical application, time measurement at the server and at the receiver are not exactly the same. This results in data being served at a different rate than it is consumed, and eventually buffer underflow or overflow at the receiver occurs. Underflow or overflow results in undesirable effects such as jumpiness of the picture.
This example uses exaggerated numbers to illustrate the problem resulting from the two clocks measuring time differently: Suppose a server sends 1 byte every 1 second (according to the server's clock), and a receiver consumes 1 byte every 1 second (according to the client's clock). The receiver has a 5-byte buffer. Suppose the server's clock is perfectly accurate. The receiver's clock is flawed. “Time” moves slower in this clock. For every 2 “real” seconds that pass, the receiver's clock counts 1 second passing. The buffer of the receiver will overflow within 5 or 6 seconds due to the different rates of serving and consuming the data.
In practical application, the clocks used in typical servers and receivers are much more accurate than in the previous example, but typically there is about a 50 parts per million (ppm) variance. Assuming a 50 ppm variance, every 200000 bytes there will be a difference of around 1 byte. A commonly used serving rate is 48,000 samples per second and commonly there are 4 bytes per audio sample. This results in a potential discrepancy of about 1 byte per second (1 sample every 4 seconds).
In addition, a further source of error from the “true” time is that the server clock as well as the receiver clock can experience internal variation—i.e. they each can speed up sometimes and slow up at other times. Thus at times the server clock may be counting time faster than the receiver clock, and at other times the opposite may happen.
It should be noted that this problem of lack of synchronization is not present when the digital video data is from a local source. For instance, synchronization of serving rate and consumption rate by the receiver is not a problem when the data is on a DVD disk and is played on a local DVD player, because the receiver can access the data as it requires it.
It is known to use a phase lock loop circuit implemented using a voltage control oscillator to match the data presentation rate with the server rate. However, these hardware components are relatively expensive.
Thus, it would be advantageous to synchronize presentation of video data at a receiver with the rate the data is served by a video server without the need for a voltage control oscillator. This is achieved through real-time adjustments to the audio stream and subsequent synchronization of the video stream with the adjusted audio stream.