Multimedia programs generally include an audio and a visual component. The audio components are synchronous to events in the visual components and should be provided in a synchronous fashion. The MPEG (Motion Picture Experts Group) specification identifies a method of synchronizing the time to present related audio and video data to a decoder. The precise time to present uncompressed data is generally indeterminate relative to the time when the data is received in compressed form. However, through presentation time stamps (PTS) it is possible to positively identify specific decoder presentation times for audio, visual or auxiliary data. Program clock reference time stamps that are given a ‘stream time’ are transmitted in an adaptation field of audio or visual packets or auxiliary data (depending which stream is a master) at least ten times every second. Having a stream time and “stamping” the data associated with packets with a PTS, a system may establish a reference to which time the data should be given to the audio/video/auxiliary decoder. PTS is available at a packetized elementary stream (PES) header, usually with the start of a video or audio frame in a PES packet payload, where a PES packet is received through a multimedia transport stream as a plurality of transport stream packets.
However, providing synchronized audio and video data to audio and video decoders does not guarantee that the respective decoders will output audio and video data in a synchronous fashion. Video data is generally more complex than audio data. The amount of time needed to decode a portion of video data is generally greater than the amount of time needed to decode an associated portion of audio data. Furthermore, in mixed analog/digital audio/video receiving and processing systems, video may be received as interlaced video, and shown on a progressive monitor. The human visual system is less sensitive to flickering details than to large-area flicker. TV displays apply interlacing to profit from this fact. Interlacing works by dividing a video frame into a set of interweaved lines, or fields, of video information. Interlacing can be defined as a type of spatio-temporal sub-sampling. De-interlacing is performed as a reverse operation in attempts to remove sub-sampling artifacts for output to a progressive, non-interlaced, display.
Many de-interlacing algorithms have been proposed. They range from simple spatial interpolation, through directional dependant filtering, up to advanced motion compensated (MC) interpolation. Many de-interlacing and frame rate conversion algorithms require frame storage capable of storing at least two, in many cases three fields before producing a progressive output. This video processing appears as a delay in outputting video data when compared to processing digitized audio associated with the video. Audio data may be fully decoded and ready to be output before video data, compromising its synchronization to the video data. In these systems audio will be output from a decoder, ahead of graphics, video or close caption, and will remain unsynchronized when output. From the above discussion it is apparent that a system for synchronizing the output of decoded audio data to the presentation of decoded video data is needed.