1. Field of the Invention
This invention relates generally to the field of digital transmission or storage of audio and video information. More particularly, this invention relates to a technique for verifying the timing of multiplexed packetized audio and/or video (A/V) information such as ISO's (International Organization for Standardization) MPEG (Motion Pictures Expert Group).
2. Background of the Invention
The term "access unit" as used herein means either a frame of video data or a batch of audio samples or a batch of other data samples. In general, a decoder processes its "access units" and outputs decoded access units at regular intervals. In the case of video, for example, this interval is the picture rate, and for audio this is a constant integer (equal to the number of audio samples in an audio access unit) times the audio sampling rate.
In the case of an ISO standard MPEG system application, as defined for example in ISO document number 1-11172 which is hereby incorporated by reference, a "time stamp" indicating the decoding time for an access unit, is included in the multiplex syntax for multiplexed video and audio packets as shown in FIG. 1. These time stamps are included in the multiplex syntax, i.e. in packet headers. A time stamp indicates the decoding time of the first access unit header in that packet. Since each packet can include multiple access units, each access unit is not associated with a time stamp.
Although multiplex applications use multiple decoders, ISO's MPEG 1 system standard can also be applied in applications that only have one decoder. In a fixed bit rate (e.g. audio) application, perfect clocks do not exist, therefore the digital storage media (or transmission) bit rate and (depending on the sampling clock frequency error) the decoder input bit rate vary. In such a system, the decoder generally reads one access unit at a time. Furthermore, due to the differences in the clocks, the transfer bit rate and the decoder input bit rate do not identically match. A buffer can be used to compensate for these differences. In the case of video data, consecutive access units (i.e. frames) are compressed to provide variable length compressed (VLC) access units having a length that depends on the picture content. Consequently the video decoder input bit rate has large variations and a relatively large buffer is used.
However, even if a buffer is used, ideal bit rates generally do not exist and therefore "buffer errors"(and even buffer overflow or underflow) can occur. Two methods are generally used to prevent buffer overflow and underflow. With one method (called Digital Storage Media slave) the transfer rate (i.e. buffer input rate) is controlled. The other method (called decoder slave) is realized by controlling the buffer output data rate. In the case of video this can be done by adjusting the frame rate. In the case of audio, this is done by adjusting the sampling rate. The buffer output data rate is thus adjusted.. Another decoder slave method skips or repeats access units in order to control the buffer output data rate.
Adjustments of the decoder rate and adjustments of the transfer bit rate are restricted by characteristics of the peripheral hardware. Therefore if the buffer error (i.e. deviation from the ideal buffer fullness) is too large, the appropriate control can become difficult or impossible. When starting playback, a large buffer error can sometimes occur. Therefore generally the decoder starts decoding after an appropriate start up delay in order to reduce the initial buffer error.
In the MPEG 1 system standard, fields have been included in the multiplex syntax which can be used to control the decoder or the transfer rate. In the pack header a value called SCR (System Clock Reference) can be used to control the transfer data rate. Time stamps in the video packet header can be used to control the frame rate, and time stamps in the audio packet header can be used to control the sampling rate. SCR indicates the time when (the first part of the) packet data enters the decoder buffer and time stamps indicate when a certain access unit in the packet data is to be removed from the decoder buffer.
Both SCR and time stamps are absolute values of a clock that increments continuously at a rate of 90 KHz. Therefore the difference between the first read SCR and the first read time stamp can be used as a start up delay.
Unfortunately, (except for the first audio and video time stamps for initial start up) it is difficult to use consecutive time stamps. The problem is that, after demultiplexing, the time stamps are separated from their related access units. I.e., since the decoding system's demultiplex switch separates time stamps from the packet data and then stores the packet data (without their time stamps) in the respective buffers, it becomes difficult to keep track of which time stamp belongs to which access unit header. For example: a decoding system processes a certain MPEG multiplex stream. The first SCR (System Clock Reference) is detected, and the system uses this value to initialize a local (90 KHz) clock. From now on this clock increments automatically at a 90 KHz rate. Then the system detects the first video DTS. It indicates (with a 90 KHz clock value) the decoding time for the first following picture header. However generally there is a significant delay before this picture header should be decoded (due to the buffer before the video decoder), and therefore the time stamp must also be delayed or buffered before it can be used. Furthermore, before the first picture is decoded, several more video packets with time stamps can arrive at the demultiplexer switch and these time stamps should also be delayed or buffered somewhere before they can be used.
With the adopted definition for time stamps, it is a significant shortcoming of the MPEG standard that the time stamps were not included in the video and audio syntax specifications.
The problem is that the time stamps are in the "wrong" layer. This might not be so bad if all access unit headers had time stamps. However, this is not the case. Therefore, simply adding an additional time stamp buffer for each decoder and assuming that each access unit has an accompanied time stamp is not possible. Since a packet header contains only one time stamp, and since the packet data can contain several access unit headers, not every access unit header will have a time stamp. Even if the packet data contains an access unit header, inserting a time stamp in the preceding packet header is optional because the only MPEG requirement concerning time stamps is that they occur at least once every 0.7 seconds.
If a particular application uses two or more decoders (e.g. one video and one audio decoder) it is often necessary to synchronize these decoders. According to the MPEG 1 system standard time stamps must be used to perform this synchronization. The standard assumes that the audio and video decoder processing delays are 0 ms. Consequently (in this model) the time when an access unit is decoded is equal to the time when the decoded access unit (i.e. "presentation unit") is output.
Besides the previously mentioned reason for the decoder start-up delay (to minimize initial buffer error), in case of multi-decoder systems, for each decoder an initial start-up delay is also necessary. This is because, for example, the audio and video packets that occur together (in the same pack) are often segments of the audio and video signal that, after decoding, should be output at different times. This is possible, because MPEG has agreed on a certain amount of buffering before each video and audio decoder, which allows a flexible multiplex bitstream structure. A different way to describe the MPEG multiplex standard is: any kind of pack and packet structure is acceptable, as long as the buffers in the reference decoder do not overflow or underflow.
Two kinds of decoding systems exist for synchronization. The first is a locked system wherein the frame rate and sampling rate are locked to a single common clock (e.g. 90 KHz in MPEG). The locked system has the disadvantage that it can only play back bit streams that were generated by an encoding system where the frame rate and sampling rate were also locked to one common clock. Whether the video and audio encoder clocks are locked or not will depend on the application. (In case of CDI-FMV, locking the encoders is mandatory). In this system, (if the transmission error characteristics are limited) after reading the first video and audio time stamps and using them for the respective decoder start, all following time stamps can be ignored. This kind of decoding system is relatively simple, and does not need to keep track of which time stamps belong to which access units. However, if a transmission error causes missing or false access unit headers, a sync error results (and a corresponding buffer error). Such problems can also occur in non-MPEG systems. The invention provides a solution for this problem.
The second kind of decoding system (called non-locked decoding system) can also play back non-locked encoded multiplex bit streams. Non-locked encoded bit streams are generated by encoding systems that have independent encoder frame and sampling rate clocks. In this case there is no relation between the video encoder's frame rate error and the audio encoder's sampling rate error; they vary independently.
Non-locked MPEG decoding systems are used if the multiplex bitstream was generated by a non-locked encoding system, i.e. an encoding system where the picture rate clock and audio sampling rate clock are independent. Whether the non-locked encoded systems will be used or not will depend on the application. For example, in case of CDI-FMV (i.e., Phillips' Compact Disc Interactive with Full Motion Video extension, which has adopted the MPEG 1 standard), independent video and audio encoder clocks are not allowed. Instead, both these clocks must be locked to a single common clock. However, in the future some applications may use non-locked MPEG systems for certain applications.
When the MPEG standard is used, non-locked encoder (frame and sampling rate) clock errors are recorded with time stamps and then included in the bit stream. During playback, in order to prevent an AV sync error, at least one decoder must have a PLL mechanism which uses time stamps regularly and make the actual frame (or sampling) rate match the time stamp values. The video decoder should thus read the video time stamps (i.e. the DTS--the Decoding Time Stamps) or the video PTS (the Presentation Time Stamps) and use these time stamps to control the picture rate, or the audio decoder should read the audio DTSs and use these to control the audio sampling rate, or both decoders should use time stamps to control their clocks.
In FIG. 1, an MPEG or similar data stream 20 of packets is shown as a mixture of video packets such as 22 and audio packets 24. Collections of packets 22 and 24 are arranged in a larger pack preceded by a pack header 26. In each case, the actual video data 27 or audio data 28 are preceded by a video packet header which contains (among other data items) time stamp 30 or an audio packet header which contains (among other data items) time stamps 32 respectively. The actual video data 27 are divided into video frames, whereas the audio data 28 are divided into batches of samples as illustrated.
According to one decoding method, the decoding system demultiplexes the incoming packets into an audio bit stream and a video bit stream, takes the time stamps from the packet header, and inserts them just before the related access unit in each elementary stream. This generates the syntax as shown in FIG. 2. In this syntax, for example, frame n+1 contains the value of the video time stamp (VTS) (i.e. a time value from a 90 KHz clock). Similarly, unit m+1 includes the audio time stamp (ATS) just prior to the audio unit m+1.
In order to produce such elementary streams, a decoding system as shown in FIG. 3 can be used. In this system, the multiplexed bit stream 20 is provided to a demultiplexer 50 which separates the bit stream into video data, video time stamps, audio data and audio time stamps. The video data are passed through a video syntax modifier 54 while the audio data are passed through an audio syntax modifier 58. The modified video and audio data are buffered in buffers 60 and 62 respectively prior to decoding by video and audio decoders 66 and 68 respectively under control of picture rate control circuit 74 and sample rate control 76. Video and audio emerge at the outputs 84 and 86 respectively.
This method has the drawback that extra modules are required before each buffer in order to insert the time stamps in the right place of the demultiplexed data streams. If the time stamps were in the correct layer, this problem would not exist.
Also, this non-locked system has a further drawback that both the elementary A/V bit streams are modified (i.e. the bit streams at the input of the decoders do not comply with the respective audio and video standards). Therefore, the decoders of this system cannot directly decode non-multiplexed audio and video bit streams.
Finally, this non-locked system succeeds in maintaining the relation between time stamps and access unit headers. Therefore, it has the ability to detect whether access unit headers are lost or falsely generated (e.g. due to a transmission error). Such an error would cause a large difference between the intended access unit decoding time (i.e. the time stamp's value) and the actual access unit's decoding time which would be detected by the affected decoder's PLL. However, unfortunately, the PLLs are designed to correct such differences by adjustments of the decoder's clock, which in the worst case will cause a third drawback. The PLL will try to repair such a large AV sync error by adjusting the decoder rate very slowly (as usual). Such a slow correction procedure requires a long time to repair the AV sync error and the corresponding buffer error. Consequently, the chance of buffer underflow or overflow increases. Also, since a large AV sync error could last for several seconds or longer, the chance that the user will notice the AV sync error increases. If the PLL attempts to repair the AV error by quickly adjusting the affected decoder's clock, other audio or video artifacts would be generated (e.g. vertical roll in the video or audio frequency shift in the audio). Therefore in this system the adjustment of a decoder clock with time stamp values is appropriate for replicating the small encoder clock errors that exist in nonlocked encoded composite bit streams. This approach is not always suitable for correcting large sync errors that were caused by a certain number of lost or excess access unit headers.
The present invention was developed to alleviate these problems of non-locked MPEG decoding systems and to overcome the disadvantage of simple locked (MPEG and non-MPEG) decoding systems as described above.
Also, the system just described has the drawback that both elementary A/V bit streams are modified (i.e. the bit streams at the input of the decoders do not comply with the respective audio and video standard). Therefore, the decoders in this system can not directly decode non-multiplexed audio and video bit streams.
The present invention was designed to alleviate several problems as described above. The simplest MPEG decoding system (as described above) and the simplest non-MPEG decoding systems are locked decoding systems (systems where the video decoder's picture rate and the audio decoder's sampling rate are locked to a single common clock). These systems have the drawback that they can not be used in applications that lose or falsely generate access unit headers (for example, due to transmission or storage errors).
The present invention overcomes these shortcomings by using an access unit count value in each elementary stream. With this value each decoder can detect missing or false access unit headers. The affected decoder can then request a system reset or try to "repair" the synchronization error by redecoding access units or by skipping access units. The decoder for such a system is easily implemented. In addition to this, access count makes a simpler non-locked decoding system possible without the disadvantages of the non-locked decoding system above.