Almost all televisions manufactured today are capable of interfacing with different sources of program materials, for example, a VCR, a digital versatile or video disk ("DVD") player, cable, DSS, etc., that provide audio signals for creating sounds and associated video input signals for creating screen displays. Some of those sources provide digital audio and video input signals in accordance with the Moving Picture Expert Group MPEG-2 audio/video digital compression standard. Thus, contemporary televisions and/or DVD systems preferably have the capability of processing compressed digital input signals and providing digital output signals representing the desired images. Most often, those digital signals are converted to analog signals for use by known analog television display units.
The implementation of digital signal processing for providing a video display and associated audio from an audio-video source of program material presents numerous design challenges that were not encountered in the prior processing of analog audio and video signals. For example, with digital signal processing, the audio signals are separated from the video signals; and the audio and video are processed independently. However, the playback of the audio and video must be synchronized, so that there is a coordinated and coherent reproduction of the desired audio and video provided by the source of program material.
The program source preferably provides the audio and video data in respective data packets in an "MPEG-2" format. Each of the audio and video data pockets is received from the source of video material in a continuous data stream. Each packet of video data includes a header block followed by a data block. The data block may include any number, for example one to twenty, of frames of video data that may include a full field of video data or be a coded group of pictures that includes its own header block identifying the picture type and display order. The header block for a video data packet includes control information, for example, the identity of the format of the video data, the type of compression, if used, picture size, display order, and other global parameters. The audio data packet has a header block that again identifies the format of the audio data with instructions relating to how the audio data is to be decoded and processed to provide desired enhancements, if applicable. Following the header block, the audio data packet includes an audio data block that has any number of blocks or frames of audio data, for example, from one to approximately twenty blocks.
Selected ones of the header blocks of the audio and video data packets include a presentation time stamp ("PTS") value which is a time stamp that is applicable to that data packet. The PTS value is a time reference to a system time clock that was running during the creation or recording of the audio and video data. A similar system time clock is also running during the playback of the audio and video data, and if the audio and video data are played back at the times represented by their presentation time stamps, the audio and video data will be presented to the user in the desired synchronized manner. Therefore, the PTS is used to synchronize the presentation or playback of the audio and video data.
During the decoding of the audio data, it normally must be decompressed, reconstructed and enhanced in a manner consistent with the source of program material and the capabilities of the sound reproduction system. In some applications, audio data packets may contain up to six channels of raw audio data. Depending on the number of channels the sound reproduction systems can reproduce, for example, from two to six, the sound reproduction system selectively uses the channels of raw audio data to provide a number of channels of audio which are then stored in an audio FIFO.
The decoding of the video data normally requires decompression, conversion of partial frames into full frames and the recognition of full frames. Simultaneously with the decoding process, the frames of audio and video data are being output, that is, played back to the user; and that playback must be synchronized such that the frames of audio and video present a coordinated and coherent presentation.
As will be appreciated from the foregoing, demultiplexing the audio and video data pockets is a complex process of deconstructing the data packets and storing the necessary decoding instructions as well as the content data itself to permit the decoding and playback of the data in a synchronized manner. In accordance with one known technique the audio and video content data or raw data is stored in respective audio and video first-in, first-out ("FIFO") memories. The FIFOs have write and read pointers that are controlled by a memory controller, which, in turn, is under the general control of a CPU. The write pointers are driven as a function of the requirements of the demultiplexing process, which sequentially delivers data to each of the FIFOs. The read pointers are driven as a function of independent and parallel decoding processes, which sequentially read data from the FIFOs. In addition to loading the raw data into the FIFOs, the demultiplexing process sequentially writes the associated PTS values, if present, into memory locations in respective audio and video PTS tables. To associate the PTS values with data in the FIFOs, in addition to a PTS value, the location in the respective FIFO of the first byte of data received after the PTS, is typically written into the respective audio and video PTS table.
While the audio and video data is being loaded into the FIFO memories by the demultiplexing process, audio and video data is simultaneously and in parallel being read from the respective FIFOs during audio and video decoding and playback processes. While both are occurring, a supervisory process must monitor the time synchronization of the video and audio data being produced by the video and audio decoding processes. In the known technique described above, this is done by relating the read pointers in the FIFOs, as they are driven by the decoding processes, to the memory locations stored in the PTS tables. When the read pointer is sufficiently close to a stored location associated with a PTS, it can be determined that the PTS identifies the current time of the associated decoding process. PTS values identified in this manner may be compared to determine whether one decoding process is ahead or behind of another.
Unfortunately, this approach has distinct disadvantages which arise from the fact that, during the audio and video decoding processes, the read pointers for the respective FIFOs are automatically and continuously driven by decoding processes interacting directly with the memory controller independent of any instructions from the CPU. This must be the case because the entire process of demultiplexing audio and video data, as well as decoding and outputting the data must occur continuously in a synchronized manner.
The above-described technique for synchronizing the audio and video decoding and playback processes presents a significant challenge, due to delays inherent in the interaction of the supervisory process with the various decoding processes. Considering, for example, the decoding of audio data, assume that the audio decoder delivers a start audio frame interrupt to the CPU running the supervisory process each time decoding of an audio frame commences. At the start of an audio frame, the supervisory process must associate the data currently being read from the audio FIFO with its corresponding presentation time stamp ("PTS"), that is, the PTS value that was loaded in the audio PTS table when the data currently being read was written into the audio FIFO. Theoretically, if the location of the read pointer at the beginning of the audio frame is known, that location can be compared with the write pointer locations that were stored in the PTS table during the demultiplexing process. If a correspondence between the current read pointer location and a stored write pointer location can be found, then the PTS associated with the stored write pointer corresponds to the PTS of the audio data currently being identified by the read pointer. If the PTS value for the data being read can accurately be determined, then the decoding and playback processes may be instructed in the known manner to skip or repeat frames in order to provide a synchronized output of frames of the audio and video.
However, there are two conditions which may result in the above process, on occasion, failing to achieve synchronization in the playback of the audio and video data. The first condition arises because the CPU running the supervisory process must time share between supervision of the audio and video decoding process, and the demultiplexing process. Accordingly, the CPU must respond to each supervised process using prioritized interrupt based communication scheme. Further, the CPU communicates with the memory controller and other functional units over a shared, time multiplexed communication bus or channel. Therefore, when a start audio frame interrupt is received by the CPU, it may not be processed immediately because the CPU is processing other interrupts of an equal or higher priority. Further, even if the CPU services the start audio frame interrupt immediately, it must then communicate with the memory controller over the time multiplexed bus. Access to the bus is arbitrated and the CPU may not have the highest priority. However, during the delay in first, processing the start audio frame interrupt and second, communicating with the memory controller over the time multiplexed bus, the decoding process for the audio data FIFO continues to read audio data from the FIFO. Therefore, when the start audio frame interrupt is serviced by the CPU and the CPU is able to communicate with the memory controller, the location of the audio data FIFO read pointer obtained by the CPU will normally be different from its location when the start audio frame interrupt was initially received by the CPU. Thus, when the CPU responds to the interrupt and obtains the current read pointer for the audio FIFO from the memory controller, the read pointer will no longer have the value it had at the time the interrupt was generated. Therefore, the result of the delay is some inaccuracy in the identity of read pointer location obtained by the CPU.
The second condition of concern is that the audio packets being read may be small and processed very quickly. Therefore, the PTS table may have two PTS entries with FIFO location values that are very close. Hence, when an inaccurate read pointer location is compared to the write pointer location values in the PTS table, an incorrect PTS entry may be associated with the start of the audio frame being decoded. Thus, there will arise a small loss of synchronization between the presentation of the audio and video frames. A single occurrence of the above event might not be detectable by the user. However, if the time to service start audio frame interrupt is longer and the processing time required for the audio data packet is very short, or if several such audio data packets occur successively, the loss of synchronization may be greater. Furthermore, the loss of synchronization in the audio process is cumulative of losses of synchronization in other decoding processes. Thus, accumulated losses of synchronization can occur to the point where the loss of synchronization is disturbing or perceptible to the user.
Consequently, in a system such as that described above, there is a need to improve the association of PTS values stored in PTS tables with the audio and video data being read from respective FIFO memories during the decoding process.