Video services are provided by a wide array of video content suppliers. For example, residential digital video services may include digital television, video on demand, Internet video, etc.—each service having hundreds of programs. A program refers to one or more bit streams that are used to represent the video content and associated audio content.
A target receiver for the programs, such as a set-top box (STB) located in a residential home, receives video programs from a number of different video content suppliers via assorted transmission channels. Typically, the “last mile” of transmission between the video content suppliers and the target receiver is along the same transmission channel, requiring the channel to carry multiple video programs from the wide array of suppliers, and often simultaneously.
There are presently a variety of different communication channels for transmitting or transporting video data. For example, communication channels such as coaxial cable distribution networks, digital subscriber loop (DSL) access networks, ATM networks, satellite, terrestrial, or wireless digital transmission facilities are all well known. Many standards have been developed for transmitting data on the communication channels. For the purposes herein, a channel is defined broadly as a connection facility to convey properly formatted digital information from one point to another.
A channel includes some or all of the following elements: 1) physical devices that generate and receive the signals (modulator/demodulator); 2) the medium that carries the actual signals; 3) mathematical schemes used to encode and decode the signals; 4) proper communication protocols used to establish, maintain, and manage the connection created by the channel; 5) storage systems used to store the signals, such as magnetic tapes and optical disks. The concept of a channel includes, but is not limited to, a physical channel, but also includes logical connections established on top of different network protocols, such as xDSL, ATM, IP, wireless, HFC, coaxial cable, Ethernet, and Token Ring.
The channel is used to transport a bit stream, or a continuous sequence of binary bits used to digitally represent compressed video, audio, and/or data. A bit rate is the number of bits per second that is required to transport the bit stream. A bit error rate is the statistical ratio between the number of bits in error due to transmission and the total number of bits transmitted. A channel capacity is the maximum bit rate at which a given channel can convey digital information with a bit error rate no more than a given value.
Since the amount of video data to be transmitted with existing communication channels is often excessive, compression is an approach that has been used to make digital video images more transportable. Digital video compression allows digitized video data to be represented in a much more efficient manner and makes it possible to transmit the compressed video data using a channel at a fraction of the bandwidth required to transmit the uncompressed video data. For example, a digitized video data having an uncompressed bit rate of roughly 120 million bits per second (Mbps) can be represented by a compressed bit stream having a bit rate of 4-6 Mbps. Compression represents significant data savings, which results in much more efficient use of channel bandwidth and storage media.
International standards have been created on video compression schemes. Examples of these standards include MPEG-1, MPEG-2, MPEG-4, H.261, H.262, H.263, etc. These compression standards rely on several algorithm schemes such as motion compensated transform coding, quantization of the transform coefficients, and variable length coding (VLC). In general, the number of bits used to represent a given image determines the quality of the encoded picture. The more bits used to represent a given image, the better the image quality. The system that is used to compress digitized video sequences using the above-described standards is typically called an encoder or encoding apparatus.
When the digital video is first compressed, the encoder assumes a particular bit rate profile, whether it is constant bit rate (CBR) or a variable bit rate (VBR). The word “profile” refers to the fact that transport bit rate may not be constant, but variable under certain constraints, such as peak bit rate, average bit rate, minimum bit rate, etc. For example, a constant bit rate stream at 4 Mbps does not have the same bit rate profile as a variable bit rate stream at an average of 4 Mbps but has larger maximum bit rate and smaller minimum bit rate, respectively.
The VBR representation of compressed video data allows a video encoder to generate compressed bit streams that, when decoded, produce consistent video quality. However, as a result of the compression process, the number of bits required to represent the compressed data differs widely from picture to picture. The specific VBR characteristics of the compressed bit stream depend on many factors including: the complexity of the video image and amount of motion in the video sequence; changes made in post-generation, such as scene cuts, fades, wipes, picture-in-picture, etc; and the amount of stuffing bits/bytes inserted into the bit stream. As channel capacities are often expressed as constant bit rates, the variable nature of VBR compressed bit stream often poses a problem for video transmission.
One potential consequence of exceeding channel capacity for a VBR compressed bit stream on a particular channel is compromised video quality. Commonly, if one or more bit streams contain too much data to fit within a channel, video data may be dropped from the bit stream or simplified to allow transmission, thus sacrificing end user video quality. Due to the real-time nature of compressed video transmission, dropped packets are not re-transmitted. Also, it is important to point out that compressed bit streams are usually generated by either real-time encoders or pre-compressed video server storage systems. Both are likely to be in a remote site, away from the network itself. This increases the difficulty in encoding the video signal with a resulting bit rate profile sensitive to the connection bandwidth available for a particular channel or target receiver.
To further reduce the excessive amount of video transmission, bit streams are frequently combined for transmission within a channel to make digital video data more transportable. A multiplex is a scheme used to combine bit stream representations of multiple signals, such as audio, video, or data, into a single bit stream representation.
One important benefit of the VBR compression is achieved through statistical multiplexing. Statistical multiplexing is an encoding and multiplexing process which takes advantage of the VBR nature of multiple compressed video signals. When a statistical multiplexer combines multiple bit streams, an algorithm may be used to adapt the bit rate of each VBR video signal but the total bit rate of the output multiplex is kept at a constant value. Statistical multiplexing encompasses multiplexing architectures having a reverse message path from the multiplexer to the encoders. This is also often referred to as closed-loop statistical multiplexing.
In applications such as video on demand, digital cable headend systems, and digital advertisement insertion systems, the requirement for the closed-loop feedback system may not provide the most optimum system efficiency, as program encoders from different providers may not be co-located. Thus, an open loop statistical multiplexing architecture, termed statistical remultiplexing, is typically used to multiplex the compressed signals and to improve the overall system efficiency, and to provide better bandwidth usage and reduced transmission cost. Statistical remultiplexing is a process which accepts multiple VBR bit streams and remultiplexes them together to output a single CBR bit stream that fits within an available channel.
FIG. 1 illustrates a prior art example of a compressed bit stream 100 having an MPEG-2 format. The MPEG-2 compression standard consists of two layers: a system layer 101 and an elementary stream layer 102. The elementary stream layer 102 typically contains the coded video and audio data and defines how compressed video (or audio) data are sampled, motion compensated (for video), transform coded, quantized and represented by different variable length coding (VLC) tables. The basic structure for a coded video picture data is a block that is an 8 pixel by 8 pixel array. Multiple blocks form a macroblock, which in turn forms part of a slice. A coded picture consists of multiple slices. Multiple coded pictures form a group of pictures.
Each block contains variable length codes (VLC) for transform coefficients. In the MPEG-2 syntax, the picture data section contains the bulk of the compressed video images. This is where the transform coefficients are encoded as VLCs. For a typical bit stream, this portion of the data takes somewhere between 70%-90% of the total bit usage of a coded picture, depending on the coded bit rate. The MPEG-2 syntax also specifies private user data fields within the elementary stream layer 102. The private user data fields may be either of variable length or fixed length.
The system layer 101 is defined to allow an MPEG-2 decoder to correctly decode audio and video data, and present the decoded result to the video screen in a time continuous manner. The system layer 101 comprises two sub-layers: a packetized elementary stream (PES) layer 104 and a transport layer 106 above the PES layer 104.
The PES layer 104 defines how the elementary stream layer is encapsulated into variable length packets called PES packets. In addition, the PES layer 104 includes presentation and decoding time stamps for the PES packets, which are used by a decoder to determine the timing to decode and display the video images from the decoding buffers.
The transport layer 106 defines how the PES packets are further packetized into fixed sized transport packets, e.g., packets of 188 bytes to produce a transport stream. Additional timing information and multiplexing information may be added in the transport layer 106. For example, transport packets may contain program clock reference (PCR) values, presentation time stamps (PTS) and decoding time stamps (DTS). PCR values are related to the encoder system time clock for a particular program. A PTS indicates the time when a video picture or audio frame should be displayed or presented relative to the PCR. A DTS indicates the time when a video picture should be decoded relative to the PCR. The transport layer 106 may be utilized as a transport stream or a program stream.
The transport stream is optimized for use in environments where errors are likely, such as transmission in a lossy or noisy media. Applications using the transport stream 106 include Direct Broadcast Service (DBS), digital or wireless cable services, broadband transmission systems, etc.
The program stream is designated for use in relatively error free environments and is suitable for applications that may involve software processing of system information, such as interactive multimedia applications. Applications using the program stream include Digital Versatile Disks (DVD) and video servers.
FIG. 2 illustrates a prior art example of a MPEG elementary video bit stream. The MPEG elementary video bit stream 200 includes start code indicating processing parameters for the bit stream 200 such as a sequence start code 202, a sequence extension including a user data header 203, a Group of Pictures (GOP) header 204, a user data header 205, a picture header 206, and a picture coding extension that includes a user data extension 207. Picture data 208 follows the picture header 206. The bit stream 200 includes a second picture header 210 preceding picture data 212.
Information in an MPEG-2 compressed bit stream also indicates the relationship between various frames within a picture. The access unit level information relates to coded pictures and may specify whether a picture is an intra frame (I frame), a predicted frame (P frame), or a bi-directional frame (B frame). An I frame contains full picture information. A P frame is constructed using a past I frame or P frame. A B frame is bi-directionally constructed using both a past and a future I or P frame, which are also called anchor frames.
FIG. 3 illustrates a prior art exemplary frame sequence 300 included in a compressed bit stream. The sequence 300 corresponds to a group of pictures in an MPEG-2 bit stream. The sequence 300 includes an initial I frame 302, P frames 304a-d and ten B frames 306a-j. The I frame 302 contains full picture information. The P and B frames are constructed from other frames as illustrated by arrows 308. Each P frame 304a-d is constructed using the I frame 302 or a previous P frame 304a-d, whichever immediately precedes the P frame (e.g., the P frame 304b uses the P frame 304a). The B frames 306a-j are bi-directionally constructed using the nearest past and future reference pictures. A reference picture is either an I or a P picture. For example, the B frames 306a and 306b are constructed using the past I frame 302 and future P frame 304a. 
Some statistical remultiplexers rely on information solely contained in the pre-compressed bit streams for re-encoding. The information is usually obtained by decoding the signal back to the spatial domain (baseband). When the statistical remultiplexer is configured within a network device, such as a router or headend, decoding increases the complexity of the network device, slows transmission of the video data, and decreases transmission efficiency. Thus, in some compressed bit streams, a transport packet containing bit rate information and/or other data associated with the bit stream may be included in the bit stream by an encoder for extraction by a receiving statistical remultiplexer. An example of such a system is described in pending U.S. patent application Ser. No. 09/684,623, entitled “Methods and Apparatus for Efficient Scheduling and Multiplexing”, filed Oct. 5, 2000, and which is hereby incorporated by reference.
FIG. 4 illustrates a prior art example of an MPEG elementary video bit stream 400 having embedded bit rate and/or other video related information 407. The MPEG elementary video bit stream 400 includes start code indicating processing parameters for the bit stream 400 such as a sequence start code 402, a sequence extension including a user data header 403, a Group of Pictures (GOP) header 404, a user data header 405, a picture header 406, and a picture coding extension that includes a user data extension 408. Picture data 412 follows the picture header 410.
The embedded data 407 may include bit rate data or other information associated with the bit stream 400. In other examples, the bit rate data and/or other video related information packet 407 may be located in different layers of the bit stream 400. When the compressed and multiplexed channels are received, for example, by a cable operator, the channels are usually “groomed” to remove unwanted or redundant programs. The groomed channels are then remultiplexed and output as a bit stream to a customer.
Generally, a statistical remultiplexer dynamically multiplexes the various VBR channels, groomed and/or ungroomed, into a single bit stream that can be output over a channel of a fixed bandwidth which may be viewed as a CBR channel. This is usually done by varying the bandwidth allocated to each VBR channel based on its current demands to maximize utilization of the allocated bandwidth.
Typically, the input channels to the remultiplexer have been multiplexed according to a known pattern, for example, time divisional multiplexing, so that the ordering of the associated channel packets can be delayed in a queue or buffer, and then resent. Most statistical remultiplexers first attempt to shift the bit rates of the selected channels in time to achieve a bit rate within the allowable output bandwidth and within an allowable time period. According to the MPEG-2 standard, there is a limitation on the size of a receiving decoder buffer. Thus, the time shift for a compressed bit stream is limited so that it shall not underflow or overflow a receiving decoder buffer after the time shifting. After the time shifting, if the bit rate is still larger than the allowable bandwidth of the output channel, the excess bits are either dropped out of the transmission, which usually results in a poorer quality transmission, or the statistical remultiplexer utilizes a bit reduction scheme to try and retain as much of the excessive bit rate transmission as possible to maintain the transmission quality. There are many bit reduction schemes utilized by various statistical remultiplexers.
Basically, the statistical remultiplexer bit rate reduction schemes process portions of a bit stream to reduce the bit so that the overall output bandwidth is within the allocated bandwidth. These bit rate reduction schemes typically require the decoding, bit rate reduction, and then re-encoding of the bit stream. The processing step can usually be repeated until the output is within the allocated output bandwidth. The obvious goal of the bit rate reduction schemes is to reduce the overall output bit rate to within the allowable bandwidth of the output channel and still produce an output transmission that is as close to input quality as possible and with as little delay as possible.
FIG. 5 illustrates a general high level architecture of a statistical remultiplexer in the prior art. Generally, a statistical remultiplexer and its components perform steps to ensure that the bandwidth of the output transport medium is fully utilized. An example of a statistical remultiplexer is described in co-pending U.S. patent application Ser. No. 09/514,577, entitled “System and Method For Multiple Channel Statistical Re-Multiplexing”, filed Feb. 28, 2000, and which is hereby incorporated by reference.
Typically, a plurality of compressed bit streams 502 and 504 are input to the statremux 506 after being initially processed, such as by demultiplexing, decoding and splitting. The bit streams 502 and 504 are input to a bit stream analyzer 508 which parses the bit streams to determine the bit usage of each of the channels for some pre-determined amount of time, T, ahead of what is currently being sent, in order to decide the incoming and outgoing bit rate for each bit stream. This is termed a look-ahead windowing technique. This technique involves buffering of the bit stream data up to some amount of time, T. The bit stream analyzer 508 inspects portions of the input compressed bit stream 502 and 504 held in the buffer(s), e.g., look-ahead window buffer(s), to extract information that can be used to assist the re-encoding process and bandwidth allocation. The information may include: bit rate, average bandwidth, peak bandwidth, instantaneous bandwidth, rate of change of instantaneous bandwidth, long term average bandwidth, long term variance of bandwidth, and video specific data, such as, decoding time (DTS) of each picture; number of bits used for each of the coded pictures, picture coding type, average quantizer scale value for each of the coded pictures, whether the coded picture is coded due to fade or scene cuts in the original video sequence, etc.
The bit stream analyzer 508 provides this information to the rate controller 510 to determine the amount and type of compression that the re-encoder 512 should perform on the bit streams 502 and 504, including bit rate reduction, if needed. The rate controller 510 uses this information to shuffle the portions of the bit streams present in the bit stream analyzer 508 to attempt to maximize the bit stream on the output channel before resorting to a bit rate reduction process, if needed. Shuffling of the bit streams is possible due to knowledge of the DTS of the all the coded pictures and the current decoder buffer for every channel.
The bit stream analyzer 508 outputs the bit streams to the re-encoder 512. The re-encoder 512 compresses the bit streams using conventional methods, such as motion compensation encoding, transform coding, quantization, and variable length encoding. The re-encoder 512 may also be able to perform any of the various types of compression in response to control signals from the rate controller 510.
The rate controller 510 specifies a bit rate for the bit stream output by the re-encoder 512. The rate controller 510 receives data from the bit stream analyzer 508, specifies what bit rates are possible, and in response to signals from the scheduler/multiplexer 514, provides commands to the re-encoder 512 to perform compression that will achieve a desired bit rate, which may include implementation of various bit rate reduction processes.
The scheduler/multiplexer 514 serves as a bandwidth allocator mechanism and controls bandwidth allocated to each of the bit stream channels to ensure that it fully utilizes the capacity of the output channel 530. The first-in/first-out (FIFO) buffers 518 and 520 temporarily store the data from the re-encoder 512 until the data can be scheduled for output through the multiplexer 522. The controller/scheduler 516 sends signals to the multiplexer 522 to control data flow from the FIFO buffers 518 and 520 into the multiplexer 522. Additionally, the filler packet adder 526 may be used to add null packets, packets containing stuffing bytes, or opportunistic data, such as program identification numbers and user information, for the times when the compressed data is insufficient to maintain a constant output bit rate over the output channel. The multiplexer 522 multiplexes the bit streams and outputs the resultant single bit stream to the decoder buffer model 524. The decoder buffer model 524 models a receiving decoder buffer and assists in regulating the bit streams to prevent an underflow or overflow of the bit stream to a receiving decoder. Also, the decoder buffer model can feedback the future bit rate requirements to the rate controller to prevent a decoder buffer model violation.
In theory, by increasing the look-ahead window buffer(s) size, a statistical remultiplexer can improve its performance by more accurately allocating the bandwidth. However, increasing the look-ahead window buffer(s) size increases the amount of data that is stored and processed. As a result, improving the accuracy with which bandwidth is allocated introduces an undesirable delay into the statistical remultiplexing system.
In view of the above, it would be desirable to improve the statistical remultiplexer performance without increasing the delay typically present in a statistical remultiplexing system.