Television broadcasts have become a powerful and pervasive source of information and entertainment. A television receiver commonly called a “television set” receives a television signal that was previously broadcasted by a television station. More recently, some computers have been adapted to receive television signals and present the corresponding television program on the computer monitor. Regardless of the receiver type or display device, these television signals typically include a video component from which a moving image is formed, and an audio component that represents sound conventionally associated with the moving image. Digital television broadcasts are segmented into digital packets of information with some of the packets containing video information (hereinafter, “digital video packets”) and some of the packets containing audio information (hereinafter, “digital audio packets”).
The video component of the digital television signal represents a sequence of “frames”, each frame representing a screenful of image data. In full-motion video, this sequence of frames is displayed at such a rate that the average human viewer cannot distinguish the individual frames and differentiate one frame from the next. Instead, the human viewer perceives a single continuous moving image. This psychological effect can be achieved by a display rate of at least 24 frames per second. In the NTSC (National Television Standards Committee) digital television standard, frames are transmitted at a frame rate of 29.94 frames per second.
The audio component of a digital television signals includes a sequence of “samples”, each sample representing the amplitude of a represented sound wave at a particular point in time. If each sample is represented by one byte (8 bits) of memory, the measured sound amplitude may be digitized to 28 (i.e., 256) different amplitude levels thereby fairly accurately representing the amplitude of the actual measures sound. If each sample is represented by 2 bytes (16 bits) of memory, the measured sound amplitude may be digitized to 216 (i.e., 65,536) different amplitude levels thereby giving the sample a higher degree of fidelity with the amplitude of the actual measured sound. Digital television stations typically transmit audio samples for a given program at a sampling rate of 48,000 samples per second. This high sampling rate permits for the fairly accurate representation of all sounds within the audible frequency spectrum of a human being.
Thus, in digital television, video data for a given program is transmitted in the form of frames at a certain frame rate and audio data for a given program is transmitted in the form of samples a certain sampling rate. Video data and audio data are received on average at the same rate that the data is transmitted.
It is critical that the video and audio data be presented at the same rate as the data is transmitted. If the video and audio are presented too quickly, the buffer within the receiver will run out of video and audio data resulting in the need for the receiver to wait for the next data. However, the next image frame or audio sample should be presented at a predetermined time after the previous image was shown to maintain a relatively constant frame and sample presentation rate. During this waiting period, if the next image frame or audio sample is not received before the appointed presentation time, the last received image frame and audio sample may be repeated often resulting in a noticeable presentation degradation. If the video and audio are presented too slowly, the receiver will overflow resulting in image frames and audio samples being dropped. This may result in the presentation skipping image frames or audio samples also resulting in presentation degradation.
Thus, there is a need to ensure that image frames and audio samples are presented at the receiver at the same rate that the image frames and audio samples are transmitted by the broadcaster so as to avoid overflowing or depleting the buffer at the receiver. To solve this problem, transmitters typically have a local clock hereinafter referred to as the transmitter clock. Likewise, the receiver has a single clock hereinafter referred to as the receiver clock that controls the presentation speed of both the image frames and audio samples. Since the presentation speed of the image frames (hereinafter, “the video presentation speed”) and the presentation speed of the audio samples (hereinafter, “the audio presentation speed”) are based on the same clock, the presentation speeds of the images and audio proportionally speed up or slow down together. For example, if 29.94 image frames and 48,000 audio samples are ideally to be presented each second, then the single local receiver clock ensures that for each image frame displayed, an average of 1603.206412826 (48,000/29.94) audio samples are sounded no matter whether the local receiver clock is presenting image frames slightly faster or slower than 29.94 frames per second to maintain synchronization with the transmitter clock.
This method has the advantage of maintaining synchronization between the video presentation and the audio presentation. Furthermore, it has the advantage of having only one local receiver clock thus simplifying the synchronization process. However, this method requires that the video and audio presentation speed be proportionally slowed down or sped up together. Therefore, what are desired are methods and systems for allowing more flexible control of the video and audio presentation speeds.