Within the past decade, the advent of world-wide electronic communications systems has enhanced the way in which people can send and receive information. Moreover, the capabilities of real-time video and audio systems have greatly improved in recent years. Real-time video and audio systems require a large bandwidth. In order to provide services such as video-on-demand and videoconferencing to subscribers, an enormous amount of network bandwidth is required. In fact, network bandwidth is often the main inhibitor to the effectiveness of such systems.
In order to minimize the effects of the constraints imposed by the limited bandwidths of telecommunications networks, compression systems and standards have evolved. These standards prescribe the compression of video and audio data and the delivery of several programs and control data in a single bit stream transmitted in a bandwidth that would heretofore only accomodate one analog program.
One video and audio compression standard is the Moving Picture Experts Group ("MPEG") standard. Within the MPEG-2 standard, video compression is defined both within a given picture, i.e., spatial compression, and between pictures, i.e., temporal compression. Video compression within a picture is accomplished by conversion of the digital image from the time domain to the frequency domain by a discrete cosine transform, quantization, variable length coding, and Huffman coding. Video compression between pictures is accomplished via a process referred to as motion compensation, in which a motion vector is used to describe the translation of a set of picture elements (pels) from one picture to another. Audio compression is as defined in the standard.
The procedure for transporting the compressed bitstream from the transmitting end to the receiving end of the system, and for thereafter decompressing the bitstream at the receiving end, so that one of the many picture sequences is decompressed and may be displayed in real-time is specified in ISO 13818-1. ISO 13818-1 is the systems or transport layer portion of the MPEG-2 standard. This portion of the standard specifies packetization of audio and video elementary bitstreams into packetized elementary streams (PES), and the combinations of one or more audio and video packetized elementary streams into a single time division or packet multiplexed bitstream for transmission and the subsequent demultiplexing of the single bitstream into multiple bitstreams for decompression and display. The single time division or packet multiplexed bit stream is as shown from various architectural and logical perspectives in the FIGURES, especially FIGS. 1 to 5, where many packets make up a single bitstream.
The concept of packetization and the mechanism of packet multiplexing are shown in FIG. 1, denominated "Prior Art", where elementary streams are formed in an audio encoder, a video encoder, a source of other data, and a source of systems data. These elementary streams are packetized into packetized elementary streams, as described hereinbelow. The packetized elementary streams of audio data, and video data, and the packets of other data and systems data are packet multiplexed by the multiplexor into a system stream.
The time division or packet multiplexed bitstream is shown, for example, in FIGS. 2 and 5, both denominated "Prior Art", which gives an overview showing the time division or packet multiplexed bitstream. The bitstream is comprised of packets, as shown in FIG. 5. Each packet, as shown in FIG. 2, is, in turn, made up of a packet header, an optional adaptation field, and packet data bytes, i.e., payload.
The MPEG-2 System Layer has the basic task of facilitating the multiplexing of one or more programs made up of related audio and video bitstreams into a single bitstream for transmission through a transmission medium, and thereafter to facilitate the demultiplexing of the single bitstream into separate audio and video program bitstreams for decompression while maintaining synchronization. By a "Program" is meant a set of audio and video bitstreams having a common time base and intended to be presented simultaneously. To accomplish this, the System Layer defines the data stream syntax that provides for timing control and the synchronization and interleaving of the video and audio bitstreams. The system layer provides capability for (1) video and audio synchronization, (2) stream multiplex, (3) packet and stream identification, (4) error detection, (5) buffer management, (6) random access and program insertion, (7) private data, (8) conditional access, and (9) interoperability with other networks, such as those using asynchronous transfer mode (ATM).
An MPEG-2 bitstream is made up of a system layer and compression layers. Under the MPEG-2 Standard (ISO/IEC 13818-1) a time division or packet multiplexed bit-stream consists of two layers, (1) a compression layer, also referred to as an inner layer, a payload layer, or a data layer, and (2) a system layer, also referred to as an outer layer or a control layer. The compression layer or inner layer contains the data fed to the video and audio decoders, and defines the coded video and audio data streams, while the system layer or outer layer provides the controls for demultiplexing the interleaved compression layers, and in doing so defines the functions necessary for combining the compressed data streams. This is shown in FIG. 3, denominated "Prior Art." As there shown a bitstream of, for example, a system layer and compression layer, is the input to a system decoder. In the system decoder the system layer data is demultiplexed into the compressed audio layer, the compressed video layer, and control data. The control data is shown in FIG. 3, denominated Prior Art, as the PCR (Program Clock Recovery) data, enable data, and start up values. The compressed data is sent to the respective audio and video data buffers, and through decoder control to the respective audio and video decoders.
The system layer supports a plurality of basic functions, (1) time division or packet multiplexing and demultiplexing of the time division or packet multiplexed multiple bit-streams, (2) synchronous display of the multiple coded bit streams, (3) buffer management and control, and (4) time recovery and identification. The system layer also supports (5) random access, (6) program insertion, (7) conditional access, and (8) error tracking.
For MPEG-2, the standard specifies two types of layer coding, a program stream (PS), for relatively lossless environments, such as CD-ROMs, DVDs, and other storage media, and transport stream (TS), for lossy media, as cable television, satellite television, and the like. The transport stream (TS), shown in FIG. 2 and denominated Prior Art, consists of a stream of transport stream packets, each of which consists of 188 bytes, divided into 4 bytes of packet header, an optional adaptation field, and up to 184 bytes of the associated packet data, that is, payload. The relationship of the layering of the access units, the PES packets, and the Transport Stream (TS) packets is shown in FIG. 5, denominated Prior Art.
The transport stream (TS) is used to combine programs made up of PES-coded data with one or more independent time bases into a single stream. Note that under the MPEG-2 standard, an individual program does not have to have a unique time base, but that if it does, the time base is the same for all of the elements of the individual program.
The packetized elementary stream (PES) layer is an inner layer portion of the MPEG-2 time division or packet multiplexed stream upon which the transport or program streams are logically constructed. It provides stream specific operations, and supports the following functions: (1) a common base of conversion between program and transport streams, (2) time stamps for video and audio synchronization and associated timing, especially for associated audio and video packets making up a television channel, presentation, or program, and having a common time base, (3) stream identification for stream multiplexing and demultiplexing, and (4) such services as scrambling, VCR functions, and private data.
As shown in FIG. 5, denominated Prior Art, video and audio elementary streams (ES) must be PES-packetized before inserting into a transport stream (TS). Elementary streams (ES) are continuous. PES packets containing an elementary stream (ES) are generally of fixed lengths. Typically, video PES packets are on the order of tens of thousands of bytes, and audio PES packets are on the order of thousands of bytes. However, video PES packets can also be specified as of undefined length.
The MPEG-2 packetized elementary stream (PES) packet structure is shown in FIG. 4. To be noted is that all of the fields after the PES packet length are optional. The PES (packetized elementary stream) packet has a PES header, an optional header, and payload. The PES header has bit start code, a packet length field, a 2 bit "10" field, a scramble control field, a priority field, a data alignment field, a copy field, a PTS/DTS (Presentation Time Stamp/Decoding Time Stamp) field, a field for other flags, and a header length field.
The "Optional Header" field includes a Presentation Time Stamp field, a Decoding Time Stamp field, an elementary stream clock reference field, a elementary stream rate field, a trick mode field, a copy info field, a Prior Packetized Elementary Stream Clock Recovery field, an extension, and stuffing.
The packet start code provides packet synchronization. The stream ID field provides packet identification. Payload identification is also provided by the stream ID. The PTS/DTS flag fields and the PTS/DTS fields provide presentation synchronization. Data transfer is provided through the packet/header length, payload, and stuffing fields. The scramble control field facilitates payload descrambling, the extension/private flag fields and the private data fields provide private information transfer.
A transport stream (TS) may contain one or more independent, individual programs, such as individual television channels or television programs, where each individual program can have its own time base, and each stream making up an individual program has its own PID. Each separate individual program has one or more elementary streams (ES) generally having a common time base. To be noted, is that while not illustrated in the FIGURES, different transport streams can be combined into a single system transport stream. Elementary stream (ES) data, that is, access units (AU), are first encapsulated into packetized elementary stream (PES) packets, which are, in turn, inserted into transport stream (TS) packets, as shown in FIG. 5, denominated Prior Art.
The architecture of the transport stream (TS) packets under the MPEG-2 specifications is such that the following operations are enabled: (1) demultiplexing and retrieving elementary stream (ES) data from one program within the transport stream, (2) remultiplexing the transport stream with one or more programs into a transport stream (TS) with a single program, (3) extracting transport stream (TS) packets from different transport streams to produce another transport stream (TS) as output, (4) demultiplexing a transport stream (TS) packet into one program and converting it into a program stream (PS) containing the same program, and (5) converting a program stream (PS) into a transport stream (TS) to carry it over a lossy medium to thereafter recover a valid program stream (PS).
At the transport layer, the transport sync byte provides packet synchronization. The Packet Identification (PID) field data provides packet identification, demultiplexing, and sequence integrity data. The PID field is used to collect the packets of a stream and reconstruct the stream. The continuity counters and error indicators provide packet sequence integrity and error detection. The Payload Unit start indicator and Adaptation Control are used for payload synchronization, while the Discontinuity Indicator and Program Clock Reference (PCR) fields are used for playback synchronization. The transport scramble control field facilitates payload descrambling. Private data transfer is accomplished through the Private Data Flag and Private Data Bytes. The Data Bytes are used for private payload data transfer, and the Stuffing Bytes are used to round out a packet.
A transport stream is a collection of transport stream packets, linked by standard tables. These tables carry Program Specific Information (PSI) and are built when a transport stream is created at the multiplexor. These tables completely define the content of the stream. Two of the tables of the transport stream are the Program Association Table (PAT) and the Program Map Table (PMT).
The Program Association Table is a table of contents of the transport stream. It contains an ID that uniquely identifies the stream, a version number to allow dynamic changes of the table and the transport stream, and an association table of pairs of values. The pairs of values, PN, and PMT-PID, are the Program Number (PN) and the PID of the tables containing the program.
The Program Map Table is a complete description of all of the streams contained in a program. Each entry in the Program Map Table is related to one and only one program. The role of the Program Map Table is to provide a mapping between packets and programs. The program map table contains a program number that identifies the program within the stream, a descriptor that can be used to carry private information about the program, the PID of the packets that contain the synchronization information, a number of pairs of values (ST, Data-PID) which, for each stream, specify the stream type (ST) and the PID of the packets containing the data of that stream or program (Data-PID).
There is also a Network Information Table used to provide a mapping between the transport streams and the network, and a Conditional Access Table that is used to specify scrambling/descrambling control and access.
In use, the tables are used to select and reconstruct a particular program. At any point in time, each program has a unique PID in the Program Map Table. The Program Map Table provides the PIDs for the selected program s audio, video, and control streams. The streams with the selected PIDs are extracted and delivered to the appropriate buffers and decoders for reconstruction and decoding.
Achieving and maintaining clock recovery and synchronization is a problem, especially with audio and video bitstreams. The MPEG-2 model assumes an end-to-end constant delay timing model in which all digital image and audio data take exactly the same amount of time to pass through the system from encoder to decoder. The system layer contains timing information that requires constant delay. The clock references are Program clock reference (PCR) and the time stamps are the Presentation Time Stamp/Decoding Time Stamp (PTS/DTS).
The decoder employs a local system clock having approximately the same 27 Megahertz frequency as the encoder. However, the decoder clock can not be allowed to free run. This is because it is highly unlikely that frequency of the decoder clock would be exactly the same as the frequency of the encoder clock.
Synchronization of the two clocks is accomplished by the Program Clock Reference (PCR) data field in the Transport Stream adaptation field. The Program Clock Reference values can be used to correct the decoder clock. Program Clock Reference, or PCR, is a 42 bit field. It is coded in two parts, a PCR Base having a 33-bit value in units of 90 kHz, and a PCR extension having a 9-bit extension in units of 27 MHz, where 27 MHz is the system clock frequency.
As a general rule, the first 33 bits of the first PCR received by the decoder initialize the counter in a clock generation, and subsequent PCR values are compared to clock values for fine adjustment. The difference between the PCR and the local clock can be used to drive a voltage controlled oscillator, or a similar device or function, for example, to speed up or slow down the local clock.
Audio and video synchronization is typically accomplished through the Presentation Time Stamp (PTS) inserted in the Packet Elementary Stream (PES) header. The Presentation Time Stamp is a 33-bit value in units of 90 kHz, where 90 kHz is the 27 MHZ system clock divided by 300.The PTS value indicates the time that the presentation unit should be presented to the user.
Digital broadcast systems, for example, digital cable television, Digital Video Broadcasting (DVB), and DSS (Digital Satellite Systems) require a significant amount of time to change channels, for example, up to half a second or longer. This is directly perceptible to users. This latency is due to complex interactions between the system host, the transport medium, and the decoder. Thus, a clear need exists to reduce the perception of channel change to the user, for example, through the provision and use of a set of specialized flags that are passed between the transport and the decoder. The flags coordinate the change of channels with the least amount of perceptible or apparent latency.