This invention is described in the context of audio-video programs, which include at least one audio signal or one video signal. However, those of ordinary skill in the art will appreciate the applicability of this invention to other types of program signals.
A program signal is composed of one or more component signals referred to herein as elementary streams. An example of an elementary stream can be one (natural or synthetic) audio signal, one (natural or synthetic) video signal, one closed captioning text signal, one private data signal, etc. Several techniques are known for compressing, formatting, storing and conveying such elementary streams. For example, the MPEG-1, MPEG-2, MPEG-4, H.263, H.263++, H.26L, and H.264/MPEG-4 AVC standards provide well-known techniques for encoding (compressing and formatting) video. Likewise, MPEG-1 (including the so-called “MP3”), MPEG-2, MPEG-4 and Dolby AC-3, provide techniques for encoding audio.
In addition, there are several known techniques for combining elementary streams for storage or transmission. MPEG-2 defines a technique for segmenting each elementary stream into packetized elementary stream (“PES”) packets, where each PES packet includes a PES packet header and a segment of the elementary stream as the payload. PES packets, in turn, may be combined with “pack headers” and other pack specific information to form “packs”. Alternatively, the PES packets may be segmented into transport packets of a transport stream, where each transport packet has a transport packet header and a portion of a PES packet as payload. These transport packets, as well as others (e.g., transport packets carrying program specific information or DVB systems information, entitlement management messages, entitlement control messages, other private data, null transport packets, etc.) are serially combined to form a transport stream.
In another known technique according to MPEG-4 systems, elementary streams may be divided into “sync-layer” (or “SL”) packets, including SL packet headers. SL packets may be combined with PES packet headers, to form PES packets, and these PES packets may be segmented and combined with transport packet headers to form transport packets. According to another technique, transport packets are not used. Rather, elementary stream data is segmented and real-time protocol (“RTP”) packet headers are appended to each segment to form RTP packets. In addition, or instead, user datagram protocol (“UDP”) or transmission control protocol (“TCP”) packet headers may be appended to segmented data to form UDP or TCP packets. Many combinations of the above are possible including formatting the elementary streams into SL packets first and then formatting the SL packets into RTP packets, encapsulating transport packets into TCP packets according to the so-called multi-protocol encapsulation (“MPE”), etc.
Herein, the MPEG-2 PES and transport streams encapsulating MPEG-2 video will be used as a model for illustrating the invention. Also, this invention is illustrated using a hierarchical signal, wherein elementary streams are carried as segments in packets or cells of one or more higher layers. The term “systems layer” is herein used to refer to such higher layers. The MPEG-2 PES streams and transport streams will be used as a specific example of the systems layer. However, those skilled in the art will appreciate that other kinds of hierarchical layers may be used interchangeably as the systems layer for the elementary stream, such as the SL layer, the RTP layer, etc. Furthermore, “systems layer” need not be restricted to the “transport layer” according to the OSI seven layer model but can, if desired, include other layers such as the network layer (e.g., internet protocol or “IP”), the data link layer (e.g., ATM, etc.) and/or the physical layer. Also, other types of elementary streams, such as encoded audio, MPEG-4 video, etc. may be used. In addition, the term “transmission” is used herein but should be understood to mean the transfer of information under appropriate circumstances via a communications medium or storage medium to another device, such as an intermediate device or a receiver/decoder.
FIG. 1 illustrates the hierarchical nature of the transport stream. A video elementary stream is shown which contains multiple compressed pictures or video images I0, B1, B2, P3, B4, B5, P6, B7, B8, 19. It should be noted that each picture is presented over an integer multiple of a fixed interval of time (e.g., 1, 2 or 3 field periods), but can have a variable amount of information.
Next, the video elementary stream is segmented into payloads for PES packets. PES packets can contain a fixed length segment of elementary stream information or a variable length segment of elementary stream information. In the illustration of FIG. 1, each PES packet encapsulates the encoded information of the video elementary stream representing precisely one encoded video picture. This is not a strict requirement of MPEG-2 but is a requirement of other standards such as ATSC. However, other strategies can be used for segmenting the elementary stream into PES packets. For example, each PES packet may be restricted to have a fixed number of bytes in its payload and/or a fixed total number of bytes (i.e., the sum of the number of bytes in the PES packet header and the number of bytes in the PES payload may be a fixed number). This is especially true for different kinds of elementary streams (e.g., audio, synthetic images) or for different encoded formats (e.g., for MPEG-4). Note also that the headers of PES packets can vary in size, depending on the presence or absence of other PES layer information in the PES header such as: time stamps (e.g., presentation time stamps and/or decoding time stamps), trick mode control information, copyright information, and PES extension data.
The PES packets themselves are segmented and placed into transport packets. All MPEG-2 transport packets have a fixed length of 188 bytes. A transport packet has a minimum sized header of 4 bytes followed by a payload. The PES packets are divided into segments and each segment is placed in a payload of a transport packet. However, transport packet headers can also be of variable length depending on whether or not the transport header is also carrying: program clock reference (PCR) time stamps, discontinuity information, bit rate information, splice point information, padding, etc. For example, PCR's must be delivered at least at a certain frequency. However, PCR's can be delivered more frequently and need not be delivered at precise moments in time. Therefore, under normal circumstances, PCR's may be found in transport packets, nominally at a certain frequency, or more frequently, but not precisely at any frequency. Indeed, transport packets containing PCR's are often moved relative to other transport packets containing PCR's for the same program as a result of remultiplexing.
In the case of MPEG-2 video or audio elementary streams, PES packets can only contain data of one elementary stream. Also, transport packets can only contain data from one elementary stream. Moreover, when carrying MPEG-2 video or MPEG-2 audio, no transport packet can contain data from two PES packets (even if such PES packets carry data of the same elementary stream). Rather, the first byte of every PES packet must be aligned with the first byte of the payload of the transport packet that carries the beginning of the PES packet. But even in those cases where PES packets need not start precisely at the first payload byte of a transport packet, there can be no guarantee that the entire stream of PES packets will divide precisely into the total capacity of the payloads of the transport packets that carry the stream of PES packets. Thus, padding data is often placed within transport packets to align variable sized PES packets with the start of their respective payloads.
Many standards for encoding and transmitting elementary streams have requirements for maintaining a strict schedule for delivering the elementary stream data, the systems layer stream carrying it, or both. These requirements are intended to ensure that the information is delivered in a timely fashion to enable seamless presentation. Often, such requirements are analogous to a “just-in-time” inventory system; streamed information is controlled so that it is delivered at just the right time and at just the right rate to make sure that enough stream information is available for decoding and presentation without interruption or delay. However, the delivery requirements also are intended to ensure that no more information is delivered than there is available storage capacity to hold it pending decoding and presentation. Furthermore, the systems layer signal often must meet certain bit rate requirements of the channel that carries the system layer stream, such as a maximum bit rate, a minimum bit rate or even a certain constant bit rate. As such, bit rates of both elementary and systems layer streams must be carefully controlled from production to consumption.
In the MPEG-2 context, the prior art teaches a number of useful devices that operate on program signals, including encoders, editors, transcoders, splicers and remultiplexers. An encoder is a device that compresses and formats a raw unencoded elementary stream to produce an encoded elementary stream. Often, the encoder outputs the encoded elementary stream in a transport stream, possibly with other encoded elementary streams of the same program. An editor is a device that edits (modifies) an elementary stream and produces an edited encoded elementary stream. An editor can receive encoded or unencoded elementary streams at its input. A transcoder receives an already encoded elementary stream and re-encodes it, e.g., at a different bit rate, according to a different encoding standard, at a different resolution, using different encoding options, etc. A splicer is a device that appends one signal to another or inserts one signal in the middle of the first. For example, a splicer may append one encoded elementary stream at the end of another elementary stream in a program so that they will be presented seamlessly and in sequence. Alternatively, the splicer could insert one program in the middle of another, e.g., in the case of inserting a commercial in the middle of a television show. A remultiplexer is a device that combines or removes programs, substitutes one or more component streams for others in a program, or modifies systems layer information of a systems layer stream. Examples of these devices are described in U.S. Pat. Nos. 6,141,447, 6,002,687, 6,038,256, 6,094,457, 6,192,083, 6,005,621, 6,229,850, 6,310,915, 5,717,464, 5,859,660 and 5,861,919.
Some of the above devices have been specifically adapted to operate in an environment where the bit rate assigned to each program varies. Herein, the term “rate shaper” is used to refer to an encoder, transcoder, splicer, editor or remultiplexer which is specifically designed to produce a systems layer signal that meets a certain bit rate constraint. For example, during each of multiple successive intervals, the total bit rate available for carrying information (e.g., in a channel) may be divided into fractions and each fraction may be allocated to a respective one of multiple programs to be outputted in a systems layer stream, in the form of a transport stream during, that interval. Such a fraction may be set to guarantee a certain quality of service for its respective program.
Each allocated fraction of the bit rate is actually the bit rate allocated for all of the transport packets carrying all of the information specific to that program, including the information of the encoded video and audio elementary streams, the PES packet header information and the transport packet header information. Such information is typically provided to a video encoder circuit of the rate shaper. The video encoder, in turn, encodes, or transcodes, a video signal to produce a certain number of bits for transmission over that interval which is approximately equal to a target number of bits. Preferably, the encoding or transcoding is also performed in an attempt to minimize the amount of encoding noise or distortion introduced by such encoding or transcoding. In theory, the target number of bits is derived from the bit rate fraction allocated to the rate shaper containing the encoder with the intention that, if the target number of bits were produced exactly, the transport stream carrying that program over the interval should be transmitted at, or near, the allocated bit rate fraction communicated to the device.
Many techniques have been described for either allocating a fraction of the total bit rate to each of multiple programs or transport streams or for controlling video encoding or transcoding to most closely generate a number of bits equal to a given target number of bits. However, very little study has been made pertaining to an optimal technique for allocating a target systems layer stream bit rate between the elementary stream itself and the requisite systems layer information to carry it. Rather, the only known technique is to use a fixed, unvarying ratio to divide the transport stream bit rate bit rate between the systems layer information of that transport stream and the encoded video elementary stream of that transport stream. Indeed, the techniques employed are quite crude and require many assumptions such as: (1) the transport header will typically only be 4 bytes long, (2) time stamps will be inserted at a very regular schedule; (3) PES packet headers will be of fixed length, (4) the PES packetization technique is fixed, etc. Based on such information, a ratio is predetermined and is never varied in operation.
The problem with this technique is that it is too rigid. Time stamps are rarely located in the transport stream at precise times according to an unvarying, fixed length period. This is especially true if the transport stream has been remultiplexed, in which case PCR bearing transport packets may have been moved relative to other transport packets carrying information for the same program. PES packetization strategies can also vary according to content, standards or pure choice. For instance, PES packets are not even used in some packetization techniques, such as the technique above where encoded video is formatted into SL packets and the SL packets, in turn are formatted into RTP packets. The crude bit rate allocation techniques do not adequately address all PES packetization strategies or dynamic variance in packetization. Over estimating the relation between the number of bits needed for the systems layer relative to the number of bits for needed for the elementary stream layer results in a lower elementary stream bit count target for a given number of systems layer stream bits than could otherwise be accommodated. Under such circumstances, the compressed elementary stream tends to have a lower fidelity than would have been possible with a more accurate estimate. Likewise, under estimating the relation between the number of bits needed for the systems layer relative to the number of bits for needed for the elementary stream layer results in a higher elementary stream bit count target for a given number of systems layer stream bits than can be accommodated. Under these circumstances, the total number of bits for the systems layer signal tends to exceed the maximum number of bits that can be transmitted for that systems layer signal.
It is desirable to provide an improvement to the rate allocation strategies that specifically accommodates a variation in the allocation of bits or bit rate between one or more elementary streams and the systems layer stream(s) in which they are carried.