Video services are provided by a wide array of video content suppliers. For example, residential digital video services may include digital television, video on demand, Internet video, etc.—each service having hundreds of programs. A program refers to one or more bit streams that are used to represent the video content and associated audio content. A target receiver for the programs, such as a set-top box (STB) located in a residential home, receives video programs from a number of different video content suppliers via assorted transmission channels. Typically, the “last mile” of transmission between the video content suppliers and the target receiver is along the same transmission channel, requiring the channel to carry multiple video programs from the wide array of suppliers, and often simultaneously.
There are presently a variety of different communication channels for transmitting or transporting video data. For example, communication channels such as coaxial cable distribution networks, digital subscriber loop (DSL) access networks, ATM networks, satellite, terrestrial, or wireless digital transmission facilities are all well known. Many standards have been developed for transmitting data on the communication channels. For the purposes herein, a channel is defined broadly as a connection facility to convey properly formatted digital information from one point to another.
A channel includes some or all of the following elements: 1) physical devices that generate and receive the signals (modulator/demodulator); 2) the medium that carries the actual signals; 3) mathematical schemes used to encode and decode the signals; 4) proper communication protocols used to establish, maintain, and manage the connection created by the channel; and 5) storage systems used to store the signals, such as magnetic tapes and optical disks. The concept of a channel includes, but is not limited to, a physical channel, but also includes logical connections established on top of different network protocols, such as xDSL, ATM, IP, wireless, HFC, coaxial cable, Ethernet, and Token Ring.
The channel is used to transport a bit stream, or a continuous sequence of binary bits used to digitally represent compressed video, audio, and/or data. A bit rate is the number of bits per second that is required to transport the bit stream. A bit error rate is the statistical ratio between the number of bits in error due to transmission and the total number of bits transmitted. A channel capacity is the maximum bit rate at which a given channel can convey digital information with a bit error rate no more than a given value.
Since the amount of video data to be transmitted with existing communication channels is often excessive, compression is an approach that has been used to make digital video images more transportable. Digital video compression allows digitized video data to be represented in a much more efficient manner and makes it possible to transmit the compressed video data using a channel at a fraction of the bandwidth required to transmit the uncompressed video data. For example, a digitized video data having an uncompressed bit rate of roughly 120 million bits per second (Mbps) can be represented by a compressed bit stream having a bit rate of 4–6 Mbps. Compression represents significant data savings, which results in much more efficient use of channel bandwidth and storage media.
International standards have been created on video compression schemes. Examples of these standards include MPEG-1, MPEG-2, MPEG-4, H.261, H.262, H.263, etc. These compression standards rely on several algorithm schemes such as motion compensated transform coding, quantization of the transform coefficients, and variable length coding (VLC). In general, the number of bits used to represent a given image determines the quality of the encoded picture. The more bits used to represent a given image, the better the image quality. The system that is used to compress digitized video sequences using the above-described standards is typically called an encoder or encoding apparatus.
When the digital video is first compressed, the encoder assumes a particular bit rate profile, whether it is a constant bit rate (CBR) or a variable bit rate (VBR). The word “profile” refers to the fact that transport bit rate may not be constant, but variable under certain constraints, such as peak bit rate, average bit rate, minimum bit rate, etc. For example, a constant bit rate stream at 4 Mbps does not have the same bit rate profile as a variable bit rate stream at an average of 4 Mbps but has larger maximum bit rate and smaller minimum bit rate, respectively.
The VBR representation of compressed video data allows a video encoder to generate compressed bit streams that, when decoded, produce consistent video quality. However, as a result of the compression process, the number of bits required to represent the compressed data differs widely from picture to picture. The specific VBR characteristics of the compressed bit stream depend on many factors including: the complexity of the video image and amount of motion in the video sequence; changes made in post-generation, such as scene cuts, fades, wipes, picture-in-picture, etc.; and the amount of stuffing bits/bytes inserted into the bit stream. Stuffing bits/bytes are code words that may be inserted into the coded bit stream to increase the bit rate to some desired bit rate and are discarded during decoding of the bit stream. As channel capacities are often expressed as constant bit rates, the variable nature of a VBR compressed bit stream often poses a problem for video transmission.
One potential consequence of exceeding channel capacity for a VBR compressed bit stream on a particular channel is compromised video quality. Commonly, if one or more bit streams contain too much data to fit within a channel, video data may be dropped from the bit stream or simplified to allow transmission, thus sacrificing end user video quality. Due to the real-time nature of compressed video transmission, dropped packets are not re-transmitted. Also, it is important to point out that compressed bit streams are usually generated by either real-time encoders or pre-compressed video server storage systems. Both are likely to be in a remote site, away from the network itself. This increases the difficulty in encoding the video signal with a resulting bit rate profile sensitive to the connection bandwidth available for a particular channel or target receiver.
To further reduce the excessive amount of video transmission, bit streams are frequently combined for transmission within a channel to make digital video data more transportable. A multiplex is a scheme used to combine bit stream representations of multiple signals, such as audio, video, or data, into a single bit stream representation.
One important benefit of the VBR compression is achieved through statistical multiplexing. Statistical multiplexing is an encoding and multiplexing process which takes advantage of the VBR nature of multiple compressed video signals. When a statistical multiplexer combines multiple bit streams, an algorithm may be used to adapt the bit rate of each VBR video signal but the total bit rate of the output multiplex is kept at a constant value. Statistical multiplexing encompasses multiplexing architecture having a reverse message path from the multiplexer to the encoders. This is also often referred to as closed-loop statistical multiplexing.
FIG. 1 illustrates a prior art example of a high level architecture for a closed-loop statistical multiplexer 100. The closed-loop statistical multiplexer (statmux) 100 has a closed-loop signal path 101 between a statmux rate controller 102 and program encoders 104 and 106. The signal path 101 provides the rate controller 102 with a global view of the bit rate requirements for each of the program encoders 104 and 106 and allows communication between the rate controller 102 and each encoder. The encoders 104 and 106 provide compressed video, audio and data bit streams to a multiplexer 105, which schedules the compressed bit streams to output a multiplexed compressed bit stream 108. Each of the encoders 104 and 106 does not have knowledge of the bandwidth requirements of data being encoded by the other encoder and hence relies on messages sent by the rate controller 102.
Based on these messages received from the statmux rate controller 102, the program encoders 104 and 106 adjust their encoding bit rate. Since the closed-loop statmux 100 relies on prompt delivery of the messages between the statmux rate controller 102 and the encoders 104 and 106, the closed-loop statmux 100 usually requires co-location of all program encoders 104 and 106, the rate controller 102 and multiplexer 105.
In applications such as video on demand, digital cable headend systems, and digital advertisement insertion systems, the requirement for the closed-loop feed back system may not provide the most optimum system efficiency as program encoders from different providers may not be co-located. Thus, an open loop statistical multiplexing architecture, termed statistical remultiplexing, is typically used to multiplex the compressed signals and to improve the overall system efficiency, and to provide better bandwidth usage and reduced transmission cost. Statistical remultiplexing is a process which accepts multiple VBR bit streams and remultiplexes them together to output a single CBR bit stream that fits within an available channel.
FIG. 2 illustrates a prior art example of a statistical remultiplexer architecture 200. The statistical remultiplexer (statremux) architecture 200 includes an statremux 201 that accepts compressed digital bit streams consisting of multiple video/audio/data programs from encoders 202 and 204. An example of a statistical remultiplexer is described in co-pending U.S. patent application Ser. No. 09/514,577, entitled “System and Method For Multiple Channel Statistical Re-Multiplexing”, filed Feb. 28, 2000, and which is hereby incorporated by reference.
As earlier discussed, international standards have been created for various video compression schemes. These include MPEG-1, MPEG-2, MPEG-4, H.261, H.262, H.263, H.263+, etc. These standardized compression schemes rely on several algorithm schemes, such as motion compensated transform coding (for example, discrete cosine transform or wavelet/sub-band transforms), quantization of the transform coefficients, and variable length coding (VLC). The motion compensated encoding removes the temporally redundant information in video sequences. The transform coding enables orthogonal spatial frequency representation of spatial domain video data. Quantization of the transformed coefficients reduces the number of levels required to represent a given digitized video. The other factor contributing to the compression is the use of variable length coding (VLC) so that most frequently used symbols are represented by the shortest code word. In general, the number of bits used to represent a given image determines the quality of the decoded picture. The more bits used to represent a given image, the better the image quality. The system that is used to compress digitized video sequence using the above-described schemes is called an encoder or encoding apparatus.
Commonly, transmission of video data is intended for real-time playback. This implies that all of the information required to represent a digital picture must be delivered to the destination in time for decoding and display in a timely manner. The channel must be capable of making such a delivery. However, a channel imposes a bit rate constraint for data being sent through the channel. This bit rate constraint often falls below the bit rate required to transport the compressed video bit stream. Thus, there is often a need to scale the transmission bandwidth required for the video data in order to fit within the available bandwidth of a network connection, or channel. This is often accomplished through a compression scheme, such as MPEG-2.
Currently, MPEG-2-based video/audio and data programming is the preferred choice of most cable operators as it is one of the simpler ways to make the conversion to digital video. In this manner, MPEG-2 transport streams can be packaged to create custom lineups.
FIG. 3 illustrates a prior art example of a compressed bit stream 300 having an MPEG-2 format. The MPEG-2 compression standard consists of two layers: a system layer 301 and an elementary stream layer 302. The elementary stream layer 302 typically contains the coded video and audio data and defines how compressed video (or audio) data are sampled, motion compensated (for video), transform coded, quantized, and represented by different variable length coding (VLC) tables. The basic structure for a coded video picture data is a block that is an 8 pixel by 8 pixel array. Multiple blocks form a macroblock, which in turn forms part of a slice. A coded picture consists of multiple slices. Multiple coded pictures form a group of pictures.
Each block contains variable length codes (VLC) for transform coefficients. In the MPEG-2 syntax, the picture data section contains the bulk of the compressed video images. This is where the transform coefficients are encoded as VLCs. For a typical bit stream, this portion of the data takes somewhere between 70%–90% of the total bit usage of a coded picture, depending on the coded bit rate. The MPEG-2 syntax also specifies private user data fields within the elementary stream layer 302. The private user data fields may be either of variable length or fixed length.
The system layer 301 is defined to allow an MPEG-2 decoder to correctly decode audio and video data, correctly synchronize audio and video data, and present the decoded result to the video screen in a time continuous manner. The system layer 301 comprises two sub-layers: a packetized elementary stream (PES) layer 304 and a transport layer 306 above the PES layer 304.
The PES layer 304 defines how the elementary stream layer is encapsulated into variable length packets called PES packets. In addition, the PES layer 304 includes presentation and decoding timestamps for the PES packets, which are used by a decoder to determine the timing to decode and display the video images from the decoding buffers.
The transport layer 306 defines how the PES packets are further packetized into fixed sized transport packets, e.g., packets of 188 bytes, to produce a transport stream. Additional timing information and multiplexing information may be added by the transport layer 306. For example, transport packets may contain program clock reference (PCR) values, presentation time stamps (PTS) and decoding time stamps (DTS). PCR values are related to the encoder system time clock for a particular program. A PTS indicates the time when a video picture or audio frame should be displayed or presented relative to the PCR. A DTS indicates the time when a video picture should be decoded relative to the PCR. The transport layer 306 may be utilized as a transport stream or a program stream.
The transport stream is optimized for use in environments where errors are likely such as transmission in a lossy or noisy media. Applications using the transport stream include Direct Broadcast Service (DBS), digital or wireless cable services, broadband transmission systems, etc.
The program stream is designated for use in relatively error free environments and is suitable for applications that may involve software processing of system information, such as interactive multimedia applications. Applications using the program stream include Digital Versatile Disks (DVD) and video servers.
FIG. 4 illustrates a prior art example of a MPEG elementary video bit stream. The MPEG elementary video bit stream 400 includes start code indicating processing parameters for the bit stream 400, such as a sequence start code 402, a sequence extension, including a user data header 403, a Group of Pictures (GOP) header 404, a user data header 405, a picture header 406, and a picture coding extension that includes a user data extension 407. Picture data 408 follows the picture header 406. The bit stream 400 includes a second picture header 410 preceding picture data 412.
Information in a MPEG-2 compressed bit stream also indicates the relationship between various frames within a picture. The access unit level information relates to coded pictures and may specify whether a picture is an intra frame (I frame), a predicted frame (P frame), or a bi-directional frame (B frame). An I frame contains full picture information. A P frame is constructed using a past I frame or P frame. A B frame is bi-directionally constructed using both a past and a future I or P frame, which are also called anchor frames.
FIG. 5 illustrates a prior art exemplary frame sequence 500 included in a compressed bit stream. The sequence 500 corresponds to a group of pictures in an MPEG-2 bit stream. The sequence 500 includes an initial I frame 502, P frames 504a–d and ten B frames 506a–j. The I frame 502 contains full picture information. The P and B frames are constructed from other frames as illustrated by arrows 508. Each P frame 504a–d is constructed using the I frame 502 or a previous P frame 504a–d, whichever immediately precedes the P frame (e.g., the P frame 504b uses the P frame 504a). The B frames 506a–j are bi-directionally constructed using the nearest past and future reference picture. A reference picture is either an I or a P picture. For example, the B frames 506a and 506b are constructed using the past I frame 502 and future P frame 504a. 
Some statistical remultiplexers rely on information solely contained in the pre-compressed bit streams for re-encoding. The information is usually obtained by decoding the signal back to the spatial domain (baseband). When the statistical remultiplexer is configured within a network device, such as a router or headend, decoding increases complexity of the network device, slows transmission of the video data, and decreases transmission efficiency. Thus, in some compressed bit streams, a transport packet containing bit rate information and/or other data associated with the bit stream may be included in the bit stream by an encoder for extraction by a receiving statistical remultiplexer. Some statistical remultiplexers contain a mechanism for extracting this information, and can obtain information on the video signals without having to decode the signal as earlier described. An example of such a system is described in pending U.S. patent application Ser. No. 09/684,623, entitled “Methods and Apparatus for Efficient Scheduling and Multiplexing”, filed Oct. 5, 2000, and which is hereby incorporated by reference.
FIG. 6 illustrates an example of an MPEG elementary video bit stream 600 having embedded bit rate and/or other video related information 607. The MPEG elementary video bit stream 600 includes start code indicating processing parameters for the bit stream 600 such as a sequence start code 602, a sequence extension including a user data header 603, a Group of Pictures (GOP) header 604, a user data header 605, a picture header 606, and a picture data 608. Picture data 612 follows the picture header 610.
The embedded data 607 may include bit rate data or other information associated with the bit stream 600. In other examples, the bit rate data and/or other video related information packet 607 may be located in different layers of the bit stream 600.
When the compressed and multiplexed channels are received, for example, by a cable operator, the channels are usually “groomed” to remove unwanted or redundant programs. The groomed channels are then remultiplexed and output as a bit stream to a customer.
Generally, a statistical remultiplexer dynamically multiplexes the various VBR channels, groomed and/or ungroomed, into a single bit stream that can be output over a channel of a fixed bandwidth which may be viewed as a CBR channel. This is usually done by varying the bandwidth allocated to each VBR channel based on its current demands to maximize utilization of the allocated bandwidth.
Typically, the input channels to the remultiplexer have been multiplexed according to a known pattern, for example, time divisional multiplexing, so that the ordering of the associated channel packets can be delayed in a queue or buffer, and then resent. Most statistical remultiplexers first attempt to shift the bit rates of the selected channels in time to achieve a bit rate within the allowable output bandwidth and within an allowable time period. According to the MPEG-2 standard, there is a limitation on the size of a receiving decoder buffer. Thus, the time shift for a compressed bit stream is limited so that it shall not underflow or overflow a receiving decoder buffer after the time shifting. After the time shifting, if the bit rate is still larger than the allowable bandwidth of the output channel, the excess bits are either dropped out of the transmission, which usually results in a poorer quality transmission, or the statistical remultiplexer utilizes a bit reduction scheme to try to retain as much of the excessive bit rate transmission as possible to maintain the transmission quality. There are many bit reduction schemes utilized by various statistical remultiplexers.
Basically, statistical remultiplexer bit rate reduction schemes process portions of a bit stream to reduce the bit rate so that the overall output bandwidth is within the allocated bandwidth. These bit rate reduction schemes typically require the decoding, bit rate reduction, and then re-encoding of the bit stream. The processing steps can usually be repeated until the output is within the allocated output bandwidth. The obvious goal of the bit rate reduction schemes is to reduce the overall output bit rate to within the allowable bandwidth of the output channel and still produce an output transmission that is as close to input quality as possible and with as little delay as possible.
FIG. 7 is a block diagram of a prior art example of the processing required for complete re-encoding. Re-encoding begins by receiving compressed video data 701. The video data 701 is then decoded, which may include variable length decoding 702, de-quantization 704, inverse transform coding 706, and motion compensation 708. Variable length coding (VLC) allows the most frequently used symbols to be represented by the shortest code word. Quantization of the transformed coefficients reduces the number of levels required to represent a given digitized video. Quantization has a direct effect on the compressed bit usage and decoded video quality. The transform coding (DCT) enables orthogonal spatial frequency representation of spatial domain video data. Motion compensation 708 includes an iterative process where P and B frames are reconstructed using a framestore memory 710.
The decoded data is then encoded by processing the video data with transform coding 712, re-quantization 714, and VLC encoding 716. After transform coding 712 and re-quantization 714, each image is decoded comprising de-quantization 718 and inverse transform coding 720 before motion compensation 722 with motion vectors provided by motion estimation 726. Motion estimation 726 is applied to generate motion vectors on a frame-by-frame basis. More particularly, a motion vector indicates an amount of movement of a macro block in an X or Y direction. Motion compensation 722 includes an iterative process where P and B frames are reconstructed using a framestore memory 724. Motion compensation 722 produces a predicted picture that is extracted 730 from the next decoded picture 728 and the residue is encoded by transform coding 712, re-quantization 714, and VLC encoding 716. This iterative process of motion compensation 722, including generation of motion vectors by motion estimation 726, produces compressed video data 732 having a lower bit rate than received (701). A lower bit rate is typically achieved by increasing quantizer values and/or changing the mode of macroblocks, etc.
For many compressed video bit stream schemes, it is possible to change the bit rate of a bit stream by simply changing the quantization step value. This approach is called re-quantization. For bit rate reduction of the video data, re-quantization 714 is performed with a larger quantization step value. The re-quantized compressed video data 732 may then be combined with other re-quantized compressed video data and transmitted onto a channel. If the resolution is also altered to achieve significantly lower bit rate reduction, this simple method cannot be used. Thus, the FIG. 7 scheme of fully decoding and re-encoding is typically used instead. As the amount of data in the compressed bit stream is reduced, the bit rate that is required for transmitting the low resolution bit stream is also reduced. Thus, enabling the bit stream to be fit within the output channel.
Ideally, a statistical remultiplexer would allocate as much bandwidth as each channel demands while minimizing bit rate reductions. Unfortunately, as most statistical remultiplexers perform some combination of packet shuffling in their buffers with some bit rate reduction process, as well as bit rate regulation using decoder buffer models, a variety of outputs result from different statistical remultiplexers. The outputs frequently have varying quality and delays.
Currently, there are no industry standards or specifications to evaluate the performance of a statistical remultiplexer. Techniques currently used to evaluate the performance of a statistical remultiplexer include visually observing the video quality at the output. Unfortunately, such evaluation methods are highly subjective and difficult to reproduce and highly expensive (due to the large number of hours of an individual's time).
In view of the above, it would be desirable to implement methods and/or apparatus that would provide a more objective evaluation of the performance of a statistical remultiplexer, as well as enable comparisons of different statistical remultiplexers.