Digital compressed video based on the MPEG set of standards is now a popular choice for delivering and storing broadcast quality video. The MPEG-2 suite, specified by ISO/IEC 13818, is already widely deployed, while the newer MPEG-4 suite, specified by ISO/IEC 14496 is rapidly gaining acceptance. In general, MPEG distinguishes between a “compression” layer, responsible for coding the raw video and associated audio signals, and the “systems” layer, responsible for the carriage, synchronization and timing of multiple such compressed signals. The units generated by the systems layer are referred to generically as “transport stream” (TS) packets. TS packets are transmitted over fixed bandwidth links.
In a typical MPEG-2 encoder, the compression layer receives a periodic sequence of frames of uncompressed digital video, and converts it into an “elementary stream” of compressed frames. While the compression layer output consists of a sequence of frames with a fixed inter-frame interval, the sizes of the frames may vary widely depending upon the quality settings of the encoder and the efficiency gained by removing spatial (i.e., within a single frame) and temporal (i.e., across frames) redundancy from the uncompressed input. The systems layer multiplexes several such elementary streams (e.g., video, audio and data), belonging to one or more video programs, into a single transport stream, consisting of a sequence of TS packets, suitable for storage and network transmission of the program(s). Several, possibly interleaved, TS packets put together comprise a single compressed frame.
In addition to multiplexing and packetization, the systems layer performs several roles including, for example, clocking stream synchronization and timing control. The MPEG-2 encoder communicates a time-base referred to as a Program Clock Reference (PCR) to the receiver of the stream via a field in the TS packet header. The encoder tightly controls the timing of each packet by appropriately choosing its PCR value. A PCR value denotes the relative departure time of the packet at the sender. The systems layer assumes a constant-delay transmission network and relies on higher layers to compensate for delay jitter in the network, if any. Consequently, the PCR also denotes the relative arrival time of the packet at the receiver. The tight control of the departure (and arrival) timing ensures that as long as a TS packet arrives at the decoder at its indicated PCR value, the decoder can re-create the original periodic sequence of frames without danger of underflow (e.g., which results in “freeze frame” artifacts) or overflow (e.g., which results in visual glitches) of its buffers.
Additionally, in open-loop networks, the PCR may also be used by the decoder to lock its clock to that of the sender, so as to maintain an identical frame period as at the input to the encoder. In order to control and synchronize the decoding process and the display time of each video and audio frame, the encoder communicates a Decode Time-Stamp (DTS) and Presentation Time-Stamp (PTS), respectively, one each for every frame. A compliant MPEG-2 receiver essentially receives TS packets belonging to a frame at their indicated PCR values and buffers them temporarily. A frame is removed from the buffer and decoded at its specified DTS value, and is presented to the viewer at its PTS value. In some transport networks, multiple TS packets may be encapsulated into a single real-time transport protocol (RTP) packet or a Transmission Control Protocol (TCP) packet, leading to additional “PCR jitter” caused by such packetization.
The MPEG-2 systems layer strictly times the departure (and arrival) of each TS packet by taking into account the available bit-rate for the transport stream, the buffer size available in compliant decoders, and a tolerable initial latency of presentation. Given widely varying frame sizes, the encoder essentially shapes the transmission through a fixed-size buffer and constrains the peak bit-rate of the transport stream. Larger frames are transmitted over a longer time interval as opposed to smaller ones, leading to a variable frame rate of departure. By assigning the PTS/DTS values as TS packets ingress the fixed-sized buffer and assigning a PCR value as the packets egress the buffer, the encoder controls a model of the receiver buffer, called the Video Buffering Verifier (VBV). The decoder is expected to maintain a buffer, known as the VBV buffer, which is at least as large as the maximum difference between the PCR and DTS values. The amount of time each frame spends in the decoder buffer is referred to as its VBV delay. As long as a decoder adheres to the VBV delay of the first frame (i.e., initial latency), the presentation can then proceed at the frame rate of the original video source. A small VBV buffer size is beneficial both in terms of the amount of memory required at the decoder and the initial latency, though it conflicts with the efficiency of shaping and thus the utilization of the available bit-rate. Relying on the VBV model, traditional video set-top boxes (STBs) only provide for a very small amount of buffering.
The systems layer uses a feedback signal to the compression layer whenever its fixed-size encoder buffer becomes full due to the bit-rate limitation at its egress. This is typically used by the compression layer to reduce the quality of the compressed frames, thereby reducing the number of bits that ingress into the buffer at that instant. If the encoder fixes a maximum quality level, to be used whenever the buffer is not full, the output of the buffer is a variable bit-rate (VBR) transport stream, whose peak rate equals the rate limit of the buffer. Such “capped-quality” peak-rate limited streams are commonly referred to generically as VBR streams. Even though such streams, by definition, do not always transmit at peak rate, a simple multiplexing system that dispatches several such streams on a shared transmission link needs to reserve the peak rate for each stream (i.e., the sum of the peak rates cannot exceed the bandwidth capacity of that transmission link) in order to account for the eventuality that during some instants each stream might need its peak rate so as to adhere to its VBV model. This can lead to a higher transmission cost by under-utilizing the available bandwidth.
Statistical multiplexing of VBR video streams is sometimes used to better utilize the transmission link bandwidth. Statistical multiplexing involves admitting a relatively higher number of transport streams into the link, thus reducing the transmission cost per stream. The sum of the peak rates is allowed to exceed the bandwidth capacity of the link, with the expectation that not every stream would require its peak rate at the same instant. In other words, the “peaks” of some streams coincide with the “valleys” of others in a statistical fashion, resulting in a lower (i.e. lower than peak) “effective bit-rate” for each stream. Special mechanisms are usually provided in the multiplexing system to address the infrequent instants during which the sum of the transmission rates does exceed the available link bandwidth, while still ensuring that the VBV model is not violated. The main issues in designing a statistical multiplexer revolve around the implementation of such mechanisms in a cost-effective fashion. Today, statistical multiplexers are commercially available as stand-alone systems, or as embedded parts of specialty encoders that originate multi-program transport streams (MPTS) and certain quadrature amplitude modulation (QAM) transmission equipment in cable TV networks.
FIG. 1 illustrates an exemplary statistical multiplexing system 100 as illustrated in the art. The multiplexer 102 is in communication with several pre-encoded video transport streams, stream one 104, stream two 106, and stream three 108. The video transport streams are dispatched on a shared output link as multiplexed stream 110. Each stream (e.g., stream one 104) includes multiple TS packets (e.g., the one or more TS packets at time slot one 112A through the one or more TS packets at time slot N 112E). The TS packets are placed in a small buffer (not shown), and a scheduler 114 in communication with the buffers transmits each packet at its specified PCR value. The scheduler 114 can use a wall clock 116 to determine the transmission times of the packets. The wall clock 116 is a counter used by the multiplexer 102 to track the progression of real time, at the desired resolution. Some amount of scheduling jitter is considered acceptable, the size of the small buffer essentially compensating for this jitter. We refer to such a scheduler as a just-in-time (JIT) scheduler. A JIT scheduler can be used to ensure that the VBV model of each stream is not violated as a result of the multiplexing operation.
For each packet J of a multiplexed stream K, a traditional multiplexer determines a departure time D(K,J). In order to adhere to the VBV model, D(K,J) typically equals the normalized PCR value of the packet, calculated by adding its absolute PCR value to the difference between D(K,1) and the PCR value of the first packet. The packet is dispatched to the shared output link when the wall clock equals D(K,J), plus some jitter introduced by the scheduler. This JIT discipline ensures that the decoder buffer does not overflow or underflow as a result of the multiplexing operation. Due to the lack of flexibility in the choice of the departure time, there remain several time slots during which the output link bandwidth remains under-utilized despite the statistical multiplexing gain due to the VBR nature of the multiplexed streams (e.g., time slots 118A, 118C, and 118D). Conversely, several time slots arise during which the output bandwidth becomes over-subscribed, due to conflicting departure times, thus requiring provisions to decrease the number of bits using transrating (e.g., time slot 120).
When the sum of the encoded transmission rates (tightly controlled by the PCR values) exceeds the output link bandwidth, the multiplexer uses a technique referred to as “transrating” (or more generically “transcoding”) on selected streams to adequately reduce the number of bits and hence the actual transmission rates. Referring again to FIG. 1, each stream is transmitted at a different rate: stream one 104 is transmitted at rate 122A, stream two 106 is transmitted at rate 122B, and stream three 108 is transmitted at rate 122C. The rate of each stream varies over time. For example, the one or more TS packets at slot one 112A of stream one 104 has an encoded transmission rate 122A of one, the one or more TS packets at slot two 112B has an encoded transmission rate 122A of two, and there is no data to be transmitted at slot three 112C. The peak bit-rate of each stream (i.e., stream one 104 through stream three 108) is a combined rate of two units. The multiplexed stream 110 has a total available bit-rate 124 of four units.
Although the sum of the peak rates can equal up to six units (i.e., because each stream can potentially have a rate of two, so the cumulative rate for a given time slot can equal six), during most time slots, the sum of the rates does not exceed the output link bandwidth due to the fact that the peaks of some stream(s) coincides with the valleys of other(s). For example, at time slot 118A of the multiplexed stream 110, the one or more TS packets of stream one 104 at slot one 112A with a rate 122A of one is multiplexed with the one or more TS packets at slot one 112A from stream three 108 with a rate 122C of one (there is no data for stream two at slot one 112A). The resulting multiplexed stream 110 at slot 118A has a cumulative rate 124 of two. Similarly, the multiplexed stream at slot 118B has a cumulative rate 124 of four (i.e., the sum of rate 122A two from stream one 104, rate 122B one from stream two 106, and rate 122C one of stream three 108 at slot two 112B).
However, at the transrating interval 120, the cumulative rate of the three streams is five, which is one too large for the total allowance of a rate 124 of four for the multiplexed stream 110. Consequently, transrating is performed at the transrating interval 120. The transrating interval shows the time slot during which the available bit-rate cannot accommodate the encoded rates.
Transrating is extremely compute-intensive, as it typically requires re-compressing the frames, by adjusting the instantaneous quality level. Quality level adjustments can be handled by video encoders, usually at no additional cost, making them suitable for creating such statistical multiplexes. Unfortunately, however, there are no provisions in the art to cost-effectively create multiplexes of arbitrarily selected streams (e.g., a mix of live TV and on-demand streams) in the network, from pre-encoded video, at high scale. Moreover, video content owners prefer not to adversely change video quality once the stream leaves the encoder.
Other approaches for statistically multiplexing video streams involve smoothing the rate changes of each individual stream at the input to the network to make their statistical combination on shared network resources more desirable. This approach offers some improvement in network resource efficiency but does not avail itself of the opportunity to tradeoff the specific delivery timing requirements of individual packets from different streams. A larger VBV buffer could be added to the decoder to reduce the required peak bit-rate of the transport stream. However, for the same quality level, it still produces a VBR stream (with valleys) without addressing the efficient sharing of the available bit-rate of the shared link.
Statistical multiplexing is commonly found in data networks as a means to allocate “average” or “sustained” bit-rates to data flows, as opposed to peak bit-rates, so as to better utilize network bandwidth. Scheduling techniques such as Head of Line Priority, Weighted Round Robin (WRR), Earliest Deadline First (EDF) and Weighted Fair Queueing (WFQ) are used to efficiently multiplex data flows onto output links of switches, routers and other networking equipment, without having to reserve the peak bit-rate of each flow. Such schemes scale relatively better, since statistical multiplexing becomes an integral part of the scheduling discipline itself without provisions for any special mechanisms to compensate for instants when the sum of the desired transmission rates exceeds the output link bandwidth. For example, when a video file is downloaded using the file transfer protocol (FTP), as opposed to being streamed in a just-in-time fashion, such a download benefits from data network statistical multiplexing and better utilizes network bandwidth. However, such a download service represents the opposite extreme, typically plagued by frequent underflows and/or excessive start-up delay at the receive buffer due to its lack of regard to the presentation timing requirements of video. There are no adequate provisions to systematically apply such scheduling disciplines to time-sensitive delivery of video transport streams.