Proper reproduction of a recorded and/or transmitted multimedia program, consisting of compressed digitized video data accompanied by associated compressed digitized audio data, requires combining two independent digital data bitstreams into a single, synchronized, serial system data stream that includes both video and audio data. Lack of or an improper synchronization of the video and audio data in assembling the data into the system data stream, or in decoding and presenting an assembled system data stream, frequently causes a visible image to appear out of synchronization with accompanying sound. For example, a presentation of images showing lip movements of an individual speaking words may not be synchronized with the audible sound of those words.
To address the preceding issue, Part 1 of the Moving Pictures Experts Group ("MPEG") standard, International Organization for Standardisation ("ISO") and International Electrotechnical Commission ("IEC") standard ISO/IEC 11172, defines a framework which permits combining-bitstreams of digitized video and audio data into a single, synchronized, serial system data stream. Once combined into a single digital data stream, the data is in a form well suited for digital storage, such as on a hard disk or CD-ROM included in a digital computer, or for transmission, such as over a cable antenna television ("CATV") system or high bit rate digital telephone system, e.g. a T1, ISDN Primary Rate, or ATM digital telecommunications access. A system data stream assembled in accordance with the ISO/IEC 11172 standard may be decoded by an MPEG decoder to obtain decoded pictures and/or decoded audio samples
The ISO/IEC 11172 standard defining MPEG compression specified that packets of data, extracted from the compressed video bitstream and from the compressed audio bitstream, are to be interleaved in assembling the system data stream. Furthermore, in accordance with the ISO/IEC 11172 standard a system data stream may include private, reserved and padding streams in addition to compressed video and compressed audio bitstreams. While properties of the system data stream as defined by the MPEG standard impose functional and performance requirements on MPEG encoders and decoders, the system data stream specified in the MPEG standard does not define an architecture for or an implementation of MPEG encoders or decoders. In fact, considerable degrees of freedom exists for possible designs and implementations of encoders and decoders that operate in accordance with the ISO/IEC 11172 standard.
A system data stream in accordance with Part 1 of the ISO/IEC 11172 standard includes two layers of data; a system data layer which envelopes digital data of a compression layer. The ISO/IEC 11172 system layer is itself divided into two sub-layers, one layer for multiplex-wide operation identified as the "pack layer," and one for stream-specific operations identified as the "packet layer." Packs, belonging to the pack layer of a system data stream in accordance with the ISO/IEC 11172 standard, include headers which specify a system clock reference ("SCR"). The SCR fixes intended times for commencing decompression of digitized video and audio data included in the compression layer in a period of 90 kilohertz ("kHz").
To effect synchronized presentation of digitized video and audio data, the ISO/IEC 11172 standard defining the packet layer provides for "presentation time-stamps" ("PTS") and also optional decoding time-stamps ("DTS"). The PTS and DTS specify synchronization for the video and audio data with respect to the SCR specified in the pack layer. The packet layer, which optionally contains both the PTS and DTS, is independent of the data contained in the compression layer defined by the ISO/IEC 11172 standard. For example, a video packet may start at any byte in the video stream. However, the PTS and optional DTS if encoded into each packet's header apply to the first "access unit" ("AU") that begins within that packet.
The MPEG standard ISO/IEC 11172 defines an AU to be the coded representation of a "presentation unit" ("PU"). The ISO/IEC 11172 standard further defines a PU as a decoded audio AU or a decoded picture. The standard also defines three (3) different methods, called "Layers" in the standard, for compressing and decompressing an audio signal. For two of these methods, the standard defines an audio AU as the smallest part of the encoded audio bitstream which can be decoded by itself. For the third method, the standard defines an audio AU as the smallest part of the encoded audio bitstream that is decodable with the use of previously acquired side and main information.
Part 1 of the ISO/IEC 11172 standard suggests that during synchronized presentation of compressed video and audio data, the reproduction of the video images and audio sounds may be synchronized by adjusting the playback of both compressed digital data streams to a master time base called the system time-clock ("STC") rather than by adjusting the playback of one stream, e.g. the video data stream, to match the playback of another stream, e.g. the audio data stream. The ISO/IEC 11172 standard suggests that an MPEG decoder's STC may be one of the decoder's clocks (e.g. the SCR, the video PTS, or the audio PTS), the digital storage media ("DSM") or channel clock, or it may be some external clock. End-to-end synchronization of a multimedia program encoded into an MPEG system data stream occurs:
a. if an encoder embeds time-stamps during assembly of the system data stream; PA1 b. if video and audio decoders receive the embedded time-stamps together with the compressed data, and PA1 c. if the decoders use the time-stamps in scheduling presentation of the multimedia program.
To inform an MPEG decoder that an encoded bitstream has an exact relationship to the SCR, a "system header" ("SH"), which occurs at the beginning of a system data stream and which may be repeated within the stream, includes a "system.sub.-- audio.sub.-- lock.sub.-- flag" and a "system.sub.-- video.sub.-- lock.sub.-- flag." Setting the system.sub.-- audio.sub.-- lock.sub.-- flag to one (1) indicates that a specified, constant relationship exists between the audio sampling rate and the SCR. Setting the system.sub.-- video.sub.-- lock.sub.-- flag to one (1) indicates that a specified, constant relationship exists between the video picture rate and the SCR. Setting either of these flags to zero (0) indicates that the corresponding relationship does not exist.
As set forth above, the ISO/IEC 11172 standard specifically provides that the system data stream may include a padding stream. Packets assembled into the system data stream from the padding stream may be used to maintain a constant total data rate, to achieve sector alignment, or to prevent decoder buffer underflow. Since the padding stream is not associated with decoding and presentation, padding stream packets lack both PTS and DTS values.
In addition to the padding stream, "stuffing" of up to 16 bytes is allowed within each packet. Stuffing is used for purposes similar to that of the padding stream, and is particularly useful for providing word (16-bit) or long word (32-bit) alignment in applications where byte (8-bit) alignment is insufficient. Stuffing is the only method of filling a packet when the number of bytes required is less than the minimum size of a padding stream packet.
A bitstream of video data compressed in accordance with Part 2 of the ISO/IEC 11172 standard consists of a succession of frames of compressed video data. A succession of frames in an MPEG compressed video data bitstream include intra ("I") frames, predicted ("P") frames, and bidirectional ("B") frames. Decoding the data of an MPEG I frame without reference to any other data reproduces an entire uncompressed frame of video-data. An MPEG P frame may be decoded to obtain an entire uncompressed frame of video data only by reference to a prior decoded frame of video data, either reference to a prior decoded I frame or reference to a prior decoded P frame. An MPEG B frame may be decoded to obtain an entire uncompressed frame of video data only by reference both to a prior and to a successive reference frame, i.e. reference to decoded I or P frames. The ISO/IEC 11172 specification defines as a group of pictures ("GOP") one or more I frames together with all of the P frames and B frames for which the I frame(s) is(are) a reference.
In assembling a system data stream, a real-time MPEG encoder must include a system header at the beginning of each system data stream, and that system header must set the system.sub.-- audio.sub.-- lock.sub.-- flag and the system.sub.-- video.sub.-- lock.sub.-- flag to either zero (0) or one (1). If a real-time MPEG encoder specifies that either or both of these flags are to be set, then it must appropriately insure that throughout the entire system data stream the specified, constant relationship exists respectively between the audio sampling rate and the SCR, and between the video picture rate and the SCR. If a compressed audio bitstream encoder operates independently of the rate at which frames of video occur, there can be no assurance that these constant relationships will exist in the encoded data that is to be interleaved into the system data stream.