This invention is described in the context of audio-video programs, which include at least one audio signal or one video signal. However, those of ordinary skill in the art will appreciate the applicability of this invention to other types of program signals.
A program signal is composed of one or more component signals referred to herein as elementary streams. An example of an elementary stream can be one (natural or synthetic) audio signal, one (natural or synthetic) video signal, one closed captioning text signal, one private data signal, etc. Several techniques are known for compressing, formatting, storing and conveying such elementary streams. For example, the MPEG-1, MPEG-2, MPEG-4, H.263, H.263++, H.26L, and H.264/MPEG-4 AVC standards provide well-known techniques for encoding (compressing and formatting) video. Likewise, MPEG-1 (including the so-called “MP3”), MPEG-2, MPEG-4 and Dolby AC-3, provide techniques for encoding audio.
In addition, there are several known techniques for combining elementary streams for storage or transmission. MPEG-2 defines a technique for segmenting each elementary stream into packetized elementary stream (“PES”) packets, where each PES packet includes a PES packet header and a segment of the elementary stream as the payload. PES packets, in turn, may be combined with “pack headers” and other pack specific information to form “packs”. Alternatively, the PES packets may be segmented into transport packets of a transport stream, where each transport packet has a transport packet header and a portion of a PES packet as payload. These transport packets, as well as others (e.g., transport packets carrying program specific information or DVB systems information, entitlement management messages, entitlement control messages, other private data, null transport packets, etc.) are serially combined to form a transport stream.
In another known technique according to MPEG-4 systems, elementary streams may be divided into “sync-layer” (or “SL”) packets, including SL packet headers. SL packets may be combined with PES packet headers, to form PES packets, and these PES packets may be segmented and combined with transport packet headers to form transport packets. According to another technique, transport packets are not used. Rather, elementary stream data is segmented and real-time protocol (“RTP”) packet headers are appended to each segment to form RTP packets. In addition, or instead, user datagram protocol (“UDP”) or transmission control protocol (“TCP”) packet headers may be appended to segmented data to form UDP or TCP packets. Many combinations of the above are possible including formatting the elementary streams into SL packets first and then formatting the SL packets into RTP packets, encapsulating transport packets into TCP packets according to the so-called multi-protocol encapsulation(“MPE”), etc.
The MPEG-2 PES and transport streams encapsulating MPEG-2 video will be used herein as a model for illustrating the invention. Also, this invention is illustrated using a hierarchical signal, wherein elementary streams are carried as segments in packets or cells of one or more higher layers. The term “systems layer” is herein used to refer to such higher layers. The MPEG-2 PES streams and transport streams will be used as a specific example of the systems layer. However, those skilled in the art will appreciate that other kinds of hierarchical layers may be used interchangeably as the systems layer for the elementary stream, such as the SL layer, the RTP layer, etc. Furthermore, “systems layer” need not be restricted to the “transport layer” according to the OSI seven layer model but can, if desired, include other layers such as the network layer (e.g., internet protocol or “IP”), the data link layer (e.g., ATM, etc.) and/or the physical layer. Also, other types of elementary streams, such as encoded audio, MPEG-4 video, etc. may be used. In addition, the term “transmission” is used herein but should be understood to mean the transfer of information under appropriate circumstances via a communications medium or storage medium to another device, such as an intermediate device or a receiver/decoder.
Audio-visual programs are obtained by using an appropriate combination of one or more elementary streams for storage or transmission of data. For example, one audio elementary stream and one video elementary stream may be combined, or one video elementary stream and multiple audio elementary streams may be combined. The transport stream format enables both single program transport streams (SPTS) in which the elementary streams of a single audio-visual program are multiplexed together into a serial stream, and multiple program transport streams (MPTS), in which the component elementary streams of multiple audio-visual programs are all multiplexed together into a single serial stream.
Referring to FIG. 1, to form a transport stream, each of N elementary streams 100 (including ES1, ES2, through ESN) is first packetized into N packetized elementary streams of (PES) packets 110, independent of its underlying compression format. Each PES packet is comprised of a PES packet header and a segment of a single elementary stream as a payload, which contains data for only a single elementary stream. However, a PES packet may contain data for more than one decoding unit (e.g., data for more than one compressed picture or for more than one compressed audio frame). A variety of packetization strategies for forming PES packets from an elementary stream are permitted.
PES packets from each elementary stream are further packetized into fixed size (188 byte) Transport Stream (TS) packets 120. Each TS packet 120, as shown in FIG. 2, consists of a fixed 4-byte Packet Header 121, an optional Adaptation Field 122 of variable length, and the remaining bytes containing the PES packet data as Payload 123. The fixed Packet Header 121 contains a field called Packet IDentifier (PID), which is a unique numeric identifier or tag for each Elementary Stream 100 carried in a Transport Stream 120. For example, one PID is assigned to a video ES of a particular program, a second, different PID is assigned to the audio ES of a particular program, etc.
TS packets 120 from multiple underlying elementary streams 100 are then multiplexed together according to the rules for transport streams set forth in the MPEG-2 Systems specification. This includes insertion of special TS packets 130 containing System Information (SI), which include tables specifying the different programs within the transport stream as well the PIDs which belong to each program. Thus the transport stream format consists of a lower compression layer, comprising the component elementary streams, and a higher systems layer, comprising the PES and TS packets.
The systems layer contains important timing information which enables the receiver to play back the audio-visual information in a time-synchronized manner. The PES packet header contains a Presentation Time Stamp (PTS) in the PES packet header which indicates the time instants at which the associated audio or video presentation unit (an audio or video frame) of a given audio-visual program should be decoded and presented to the user. This PTS is relative to the System Time Clock used by the transmitting encoder. The TS packets also carry samples of this encoder clock called Program Clock References (PCR) in a quasi-periodic manner to enable the receiver to synchronize its clock to that of the encoder. This enables the receiver to decompress and present the audio and video data at the correct times, thereby recreating the original presentation.
A requirement for MPEG-2 transport streams is that the PCR for each program must be sent at least once every 100 ms. In the case of the DVB extension (Specification of Service Information (SI) in DVB Systems, ETSI Standard EN 300 468, May 2000) to MPEG-2, these PCR packets are to be sent at least once every 40 ms. PCR information, along with other optional information, is carried in the TS packet inside the Adaptation Field 122. The PCRs for a given program can be carried in the TS packets carrying any one of the component elementary streams 100 of that program (as identified by its PID), or they can be carried in separate TS packets with a unique PCR PID. Typically PCRs are carried in the video PID of a program.
In the MPEG-2 context, there are many applications that require one or more audio-visual programs carried inside a MPEG-2 transport stream to be modified at the elementary stream level, using stream processing devices. The prior art teaches a number of “stream processors” or devices, such as transcoders, editors and splicers, that process previously generated transport streams. A transcoder receives an already encoded elementary stream and re-encodes it, e.g., at a different bit rate, according to a different encoding standard, at a different resolution, using different encoding options, changing the audio sampling rate or video frame rate, etc. while maintaining the underlying content with as much fidelity as possible. A splicer is a device that appends one signal to another, inserts that signal in the middle of the first, or replaces part of the signal at a given instant. For example, a splicer may append one encoded elementary stream at the end of another elementary stream in a program so that they will be presented seamlessly and in sequence. Alternatively, the splicer could insert one program in the middle of another, e.g., in the case of inserting a commercial in the middle of a television show. An editor is a device that edits (modifies) an elementary stream and produces an edited encoded elementary stream. Examples of these devices are described in U.S. Pat. Nos. 6,141,447, 6,038,256, 6,094,457, 6,192,083, 6,005,621, 6,229,850, 6,310,915, and 5,859,660.
In such stream processing, the underlying bit positions of various parts of the elementary stream have been changed. For instance, video or audio transcoding tends to change the amount of information (number of bits) needed to represent each presentable portion of the video or audio. This is especially true for a transcoder that changes the bit rate of the output signal but is also true of a transcoder which, for example, re-encodes the elementary stream according to a different standard than it was originally prepared. Likewise, a splice or edit tends to change the relative location of two points (namely, the end point of the original encoded video signal portion that precedes the inserted elementary stream information and the beginning point of the original encoded video signal portion that follows the inserted elementary stream information) in the originally encoded video signal. Therefore, the modified elementary streams must be re-packetized and re-multiplexed into a syntax-compliant transport stream for serial transmission.
One of the critical requirements in transport stream output packetization and delivery is that the inherent information content in the outgoing elementary streams retain the same timing relationship as that of the input. This is required to enable the receiver to play back the underlying audio-visual presentation in a time-synchronized manner. Since the relationship between input and output elementary stream bits is invalidated by the process of stream processing, the output packetization process must somehow re-create the original timing relationship.
Existing approaches to this problem address this by using a full-fledged multiplexer at the output. This involves first recovering the original encoder clock for each modified program using clock recovery techniques like phase locked loops. Thereafter, the presentation times and decoding times of each outgoing audio or video frame are determined and re-stamped and inserted into the PES packets, and each outgoing TS packet is emitted in a manner that complies with the T-STD buffer model. Finally, PCR values are inserted into the emitted TS packets at the required frequency by looking up the recovered encoder clock at the instant of departure of the PCR-bearing TS packets. Since the timing information is completely regenerated and inserted, non-modified elementary streams in any processed program need to be de-packetized to their elementary stream levels, re-packetized, and re-transmitted. All these tasks, especially the need to obey T-STD buffer model requirements, impose a large implementation overhead, thereby increasing the complexity and cost of the stream processing system.