Audio and video (A/V) transmission and reception requires that the audio and video components be properly synchronized. EIA standards RS-250-B limit the time differential between associated audio and video signals to 25 ms. lead or 40 ms. lag time. Film standards limit the time differential of associated audio and video to .+-.1/2 frame, which corresponds to 20.8 ms. An acceptable goal for source to viewer A/V synchronization is therefore a time differential of .+-.20 ms.
Digital communication systems typically time-multiplex associated signal components over a single channel. Such multiplexing is common among A/V transmission systems proposed and implemented for cable, fiber, terrestrial and satellite applications. The time multiplexing of the signal components may destroy their natural time relationships between the transmission and display of the information. Therefore, time critical components of the transmitted component signals may be associated with a time reference before being multiplexed. This is referred to as "stamping" the information, and timing samples are referred to as time stamps. The receiver may then output the respective components in time relative to their respective time stamps. However to accomplish this the receiver must maintain a very precise local time reference, which is synchronous to the encoder time reference.
One reason that the receiver must be tightly coupled to the time base of the transmitter is to insure that the output of real time data matches that of the input to the receiver. If the receiver provides (displays) the data too rapidly, buffers in the receiver may underflow resulting in an interruption of the output signal. If the receiver outputs the data too slowly, the buffers may overflow (assuming finite rate buffers) resulting is a loss of data.
In one proposed system the receiver is synchronized to the transmitter by supplemental time stamps (system clock references, SCR) associated with predetermined packets of transmitted information. The timing of the capture of the time stamps, SCR, bear no relation to presentation time stamps (PTS) which are related to the video data, other than by virtue of being derived from the same counter. The SCR codes are generated by sampling a modulo 2.sup.N counter (N.gtoreq.32) which counts a substantially constant frequency crystal clock at the transmitter. The receiver incorporates a phase locked loop which has a free running frequency substantially equal to the frequency of the clock in the transmitter. The receiver clock (local clock) is also counted modulo 2.sup.N, and each time a SCR arrives at the receiver the local counter is sampled to provide a local clock reference or LCR. No attempt is made to force the LCR to equal the SCR. Rather the local clock is adjusted based upon processing changes in the difference between the LCR and SCR time stamps. An error signal is generated according to the relation EQU ERR=.vertline.SCR.sub.n -SCR.sub.n-1 .vertline.-.vertline.LCR.sub.n -LCR.sub.n-1 .vertline.
The signal ERR is utilized to control the local clock frequency. Via this process the LCR can be made arbitrarily close to the transmitter clock frequency. Note that since both the system and local clocks are counting modulo N, they periodically wrap around. On these occurrences the respective terms SCR.sub.n -SCR.sub.n-1 and LCR.sub.n -LCR.sub.n-1 will be negative and erroneous. The system monitors the polarity of the respective differences and when one of the differences is negative the difference is ignored.
Video signal coded according to the MPEG standard includes presentation time stamps, PTS.sub.vid, which are synchronized to the input video frames. The respective PTS.sub.vid indicate the relative times that the respective frames are to be displayed at the receiver, nominally 30 Hz, for NTSC source material. Associated audio is also encoded with presentation time stamps PTS.sub.aud based on the same time base as the system time and which time stamps are placed in an MPEG system packet layer encompassing the encoded audio data. An audio system packet layer may contain several "frames" of audio data and respective frames equal, in this example, 24 ms. of original audio data. Audio frames are approximately six times the duration of a (127 byte) transport packet. (Information to be transmitted, audio, video, data, etc. are segmented into respective transport packets of predetermined size, with a variety of control words appended, to provide an extra layer of error correction/detection and synchronization.) In addition, according to the MPEG protocol, the number of audio frames per MPEG system layer is a variable Hence there may be little or no correlation between the video PTS.sub.vid and audio PTS.sub.aud presentation time stamps for associated audio and video source material. Thus synchronizing the audio and video components is difficult if one attempts to do so by comparing the PTS.sub.vid with the PTS.sub.aud. It is an object of the present invention to simplify the process of synchronizing associated audio and video components.