This invention is described in the context of audio-video programs, which include at least one audio signal or one video signal. However, those of ordinary skill in the art will appreciate the applicability of this invention to other types of program signals.
The MPEG-2 Systems specification, ISO/IEC 13818-1, describes a standardized method and data format of packetizing and multiplexing compressed digital audio-visual information for serial transmission applications. This format is called the transport stream format and can be used to multiplex compressed data from one or more audio-visual programs into a single stream. It exhibits a hierarchical structure in which the compressed audio-visual data is present at the lower, compression level, and the packetization and multiplexing of this information is carried out at the higher, systems level. The raw compressed representation of one audio or video signal is referred to as an elementary stream (ES). Compression formats for elementary streams include—but are not restricted to—MPEG-1 Video (ISO/IEC 11172-2 and 11172:3), MPEG-2 Video (ISO/IEC 13818-2), MPEG-4 Video (Part 2 or 10), H.263++, H.26L and the draft H.264/MPEG-4 Part 10 for encoding video data, and MPEG-1 Audio, MPEG-2 Audio (ISO/IEC 13818-3), and MPEG-4 Audio and Dolby-AC-3 for encoding audio data.
The MPEG-2 PES and transport streams encapsulating MPEG-2 video will be used herein as a model for illustrating the invention. The MPEG-2 PES streams and transport streams will be used as a specific example of the systems layer. Those skilled in the art will appreciate that other types of elementary streams, such as encoded audio, MPEG-4 video, etc. may be encapsulated in the PES and transport streams rather than MPEG-2 video.
Audio-visual programs are obtained by using an appropriate combination of one or more elementary streams for storage or transmission of data. For example, one audio elementary stream and one video elementary stream may be combined, or one video elementary stream and multiple audio elementary streams may be combined. The transport stream format enables both single program transport streams (SPTS) in which the elementary streams of a single audio-visual program are multiplexed together into a serial stream, and multiple program transport streams (MPTS), in which the component elementary streams of multiple audio-visual programs are all multiplexed together into a single serial stream.
Referring to FIG. 1, to form a transport stream, each of N elementary streams 100 (including ES1, ES2, through ESN) is first packetized into N packetized elementary streams of (PES) packets 110, independent of its underlying compression format. Each PES packet is comprised of a PES packet header and a segment of a single elementary stream as a payload, which contains data for only a single elementary stream. However, a PES packet may contain data for more than one decoding unit (e.g., data for more than one compressed picture or for more than one compressed audio frame). A variety of packetization strategies for forming PES packets from an elementary stream are permitted.
PES packets from each elementary stream are further packetized into fixed size (188 byte) transport stream (TS) packets 120. Each TS packet 120, as shown in FIG. 2, consists of a fixed 4 byte packet header 121, an optional adaptation field 122 of variable length, and the remaining bytes containing the PES packet data as payload 123. The fixed packet header 121 contains a field called Packet IDentifier (PID), which is a unique numeric identifier or tag for each elementary stream 100 carried in a transport stream 120. For example, one PID is assigned to a video ES of a particular program, a second, different PID is assigned to the audio ES of a particular program, etc.
TS packets 120 from multiple underlying elementary streams 100 are then multiplexed together according to the rules for transport streams set forth in the MPEG-2 Systems specification. This includes insertion of special TS packets 130 containing System Information (SI) which include tables specifying the different programs within the transport stream as well the PIDs which belong to each program. Thus, the transport stream format consists of a lower compression layer, comprising the component elementary streams, and a higher system layer, comprising the PES and TS packets.
The system layer contains important timing information which enables the receiver to play back the audio-visual information in a time-synchronized manner. This includes a Presentation Time Stamp (PTS) in the PES packet header which indicates the time instants at which the associated audio or video presentation unit (an audio or video frame) of a given audio-visual program should be decoded and presented to the user. This PTS is relative to the System Time Clock used by the transmitting encoder. The TS packets also carry samples of this encoder clock called Program Clock References (PCR) in a quasi-periodic manner to enable the receiver to synchronize its system time clock to that of the encoder. This enables the receiver to decompress and present the audio and video data at the correct times, thereby recreating the original presentation.
A requirement for MPEG-2 transport streams is that the PCR for each program must be sent at least once every 100 ms. In the case of the DVB extension (Specification of Service Information (SI) in DVB Systems, ETSI Standard EN 300 468, May 2000) to MPEG-2, these PCR packets are to be sent at least once every 40 ms. PCR information, along with other optional information, is carried in the TS packet inside the adaptation field 122. The PCRs for a given program can be carried in the TS packets carrying any one of the component elementary streams 100 of that program (as identified by its PID), or they can be carried in separate TS packets with a unique PCR PID. Typically, PCRs are carried in the video PID of a program.
In the MPEG-2 context, there are many applications that require one or more audio-visual programs carried inside a MPEG-2 transport stream to be modified at the elementary stream level, using stream processing devices. The prior art teaches a number of “stream processors” or devices, such as transcoders, editors and splicers, that process previously generated transport streams. A transcoder receives an already encoded elementary stream and re-encodes it, e.g., at a different bit rate, according to a different encoding standard, at a different resolution, using different encoding options, changing the audio sampling rate or video frame rate, etc. while maintaining the underlying content with as much fidelity as possible. A splicer is a device that appends one signal to another, inserts that signal in the middle of the first, or replaces part of the signal at a given instant. For example, a splicer may append one encoded elementary stream at the end of another elementary stream in a program so that they will be presented seamlessly and in sequence. Alternatively, the splicer could insert one program in the middle of another, e.g., in the case of inserting a commercial in the middle of a television show. An editor is a device that edits (modifies) an elementary stream and produces an edited encoded elementary stream. Examples of these devices are described in U.S. Pat. Nos. 6,141,447, 6,038,256, 6,094,457, 6,192,083, 6,005,621, 6,229,850, 6,310,915, and 5,859,660.
In such stream processing, the underlying bit positions of various parts of the elementary stream have been changed. For instance, video or audio transcoding tends to change the amount of information (number of bits) needed to represent each presentable portion of the video or audio. This is especially true for a transcoder that changes the bit rate of the output signal but is also true of a transcoder which, for example, re-encodes the elementary stream according to a different standard than it was originally prepared. Likewise, a splice or edit tends to change the relative location of two points (namely, the end point of the original encoded video signal portion that precedes the inserted elementary stream information and the beginning point of the original encoded video signal portion that follows the inserted elementary stream information) in the originally encoded video signal. Therefore, the modified elementary streams must be re-packetized and re-multiplexed into a syntax-compliant transport stream for serial transmission.
One of the critical requirements in transport stream output packetization and delivery is that the inherent information content in the outgoing elementary streams retain the same timing relationship as that of the input. This is required to enable the receiver to play back the underlying audio-visual presentation in a time-synchronized manner. Since the relationship between input and output elementary stream bits is invalidated by the process of stream processing, the output packetization process must somehow re-create the original timing relationship.
Existing approaches to this problem address this by using a full-fledged multiplexer at the output. This involves first recovering the original encoder clock for each modified program using clock recovery techniques like phase locked loops. Thereafter, the presentation times and decoding times of each outgoing audio or video frame are determined and re-stamped and inserted into the PES packets, and each outgoing TS packet is emitted in a manner that complies with the T-STD buffer model. Finally, PCR values are inserted into the emitted TS packets at the required frequency by looking up the recovered encoder clock at the instant of departure of the PCR-bearing TS packets. Since the timing information is completely regenerated and inserted, non-modified elementary streams in any processed program need to be de-packetized to their elementary stream levels, re-packetized, and re-transmitted. All these tasks, especially the need to obey T-STD buffer model requirements, impose a large implementation overhead, thereby increasing the complexity and cost of the stream processing system.