1. Field of the Invention
The present invention relates generally to systems and methods for controlling the statistical remultiplexing process. More specifically, the present invention relates to measuring the complexity of compressed digital video signals and the application of such measurements to the statistical remultiplexing process.
2. Description of the Related Art
Video services are provided by a wide array of video content suppliers. For example, residential digital video services may include digital television, video on demand, Internet video, etc.—each service having hundreds of programs. A program refers to one or more bitstreams that are used to represent the video content and associated audio content. A target receiver for the programs, such as a set-top box (STB) located in a residential home, receives video programs from a number of different video content suppliers via assorted transmission channels. Typically, the ‘last mile’ of transmission between the video content suppliers and the target receiver is along the same transmission channel, requiring the channel to carry multiple video programs from the wide array of suppliers—and often simultaneously.
There are presently a variety of different communication channels for transmitting or transporting video data. For example, communication channels such as coaxial cable distribution networks, digital subscriber loop (DSL) access networks, ATM networks, satellite, terrestrial, or wireless digital transmission facilities are all well known. In fact, many standards have been developed for transmitting data on the communication channels. For the purposes herein, a channel is defined broadly as a connection facility to convey properly formatted digital information from one point to another. A channel includes some or all of the following elements: 1) physical devices that generate and receive the signals (modulator/demodulator); 2) medium that carries the actual signals; 3) mathematical schemes used to encode and decode the signals; 4) proper communication protocols used to establish, maintain and manage the connection created by the channel 5) storage systems used to store the signals such as magnetic tapes and optical disks. The concept of a channel includes but is not limited to a physical channel, but also logical connections established on top of different network protocols, such as xDSL, ATM, IP, wireless, HFC, coaxial cable, Ethernet, Token Ring, etc.
The channel is used to transport a bitstream, or a continuous sequence of binary bits used to digitally represent compressed video, audio and/or data. A bit rate is the number of bits per second that is required to transport the bitstream. A bit error rate is the statistical ratio between the number of bits in error due to transmission and the total number of bits transmitted. A channel capacity is the maximum bit rate at which a given channel can convey digital information with a bit error rate no more than a given value.
Since transmission of video data with existing communication channels is often excessive, compression is an approach that has been used to make digital video images more transportable. Digital video compression allows digitized video data to be represented in a much more efficient manner and makes it possible to transmit the compressed video data using a channel at a fraction of the bandwidth required to transmit the uncompressed video data. For example, a digitized video data having an uncompressed bit rate of roughly 120 million bits per second (Mbps) can be represented by a compressed bitstream having a bit rate of 4–6 Mbps. Compression represents in significant data savings which results in much more efficient use of channel bandwidth and storage media.
When the digital video is first compressed, the encoder assumes a particular bit rate profile, whether it is constant bit rate (CBR) or a variable bit rate (VBR). The word “profile” refers to the fact that transport bit rate may not be constant, but variable under certain constraints, such as peak bit rate, average bit rate, minimum bit rate, etc. For example, a constant bit rate stream at 4 Mbps does not have the same bit rate profile as a variable bit rate stream at an average of 4 Mbps but has larger maximum bit rate and smaller minimum bit rate, respectively.
The VBR representation of compressed video data allows a video encoder to generate compressed bitstreams that, when decoded, produce consistent video quality. However, as a result of the compression process, the number of bits required to represent the compressed data differs widely from picture to picture. The specific VBR characteristics of the compressed bitstream depends on the complexity of the video image, amount of motion in the video sequence, as well as changes made in post-generation such as scene cuts, fades, wipes, picture-in-picture, etc. As channel capacities are often expressed as constant bit rates, the variable nature of VBR compressed bitstream often poses a problem for video transmission.
One potential consequence of exceeding channel capacity for a VBR compressed bitstream on a particular channel is compromised video quality. Commonly, if one or more bitstreams is too large to fit within a channel, video data may be dropped from the bitstream or simplified to allow transmission; thus sacrificing end user video quality. Due to the real-time nature of compressed video transmission, dropped packets are not re-transmitted. Also, it is important to point out that compressed bitstreams are usually generated by either real-time encoders or pre-compressed video server storage systems. Both are likely to be in a remote site, away from the network itself. This increases the difficulty in encoding the video signal with a resulting bit rate profile sensitive to the connection bandwidth available for a particular channel or target receiver.
To further reduce the excessive amount of video transmission, bitstreams are frequently combined for transmission within a channel to make digital video data more transportable. A multiplex is a scheme used to combine bitstream representations of multiple signals, such as audio, video, or data, into a single bitstream representation. A re-multiplex is a scheme used to combine multiple bitstream representations of multiplexed signals into a single bitstream representation.
One important benefit of the VBR compression is achieved through the so-called ‘statistical multiplexing’. Statistical multiplexing is an encoding and multiplexing process which takes advantage of the VBR nature of multiple compressed video signals. When a statistical multiplexer combines multiple bitstreams, an algorithm may be used to adapt the bit rate of each VBR video signal but the total bit rate of the output multiplex is kept at a constant value. Statistical multiplexing encompasses multiplexing architecture having a reverse message path from the multiplexer to the encoders. This is also often referred to closed-loop statistical multiplexing.
FIG. 1A illustrates a high level architecture for a conventional closed-loop statistical multiplexer (statmux) 10. The closed-loop statmux 10 has a closed-loop signal path 11 between a statmux rate controller 12 and program encoders 14 and 16. The signal path 11 provides the rate controller 12 with a global view of the bit rate requirements for each of the program encoders 14 and 16 and allows communication between the rate controller 12 and each encoder. The encoders 14 and 16 provide compressed video, audio and data bitstreams to a multiplexer 15, which schedules the compressed bitstreams to output a multiplexed compressed bitstream 18. Each of the encoders 14 and 16 does not have knowledge of the bandwidth requirements of data being encoded by the other encoder and hence relies on messages sent by the rate controller 12. Based on these messages received from the statmux rate controller 12, the program encoders 14 and 16 adjust their encoding bit rate. Since the closed-loop statmux 10 relies on prompt delivery of the messages between the statmux rate controller 12 and the encoders 14 and 16, the closed-loop statmux 10 usually requires co-location of all program encoders 14 and 16, the rate controller 12 and multiplexer 15.
Statistical re-multiplexing, also called open-loop statistical multiplexing, or statistical rate re-multiplexing, is a process which performs statistical multiplexing of signals already in compressed format. Thus, statistical re-multiplexing includes accepting multiple VBR bitstreams and outputting a single CBR bitstream that fits within an available channel. In applications such as video on demand, digital cable headend systems, and digital advertisement insertion systems, statistical re-multiplexing may improve the overall system efficiency, resulting in better bandwidth usage and reduced transmission cost.
A conventional open-loop statistical re-multiplexer (stat remux) architecture 20 is illustrated in FIG. 1B. The architecture 20 includes an open-loop statistical re-multiplexer 21 that accepts compressed digital bitstreams consisting of multiple video/audio/data programs from encoders 22 and 24. The benefit of the open-loop stat remux architecture 20 is that it does not require reverse signal paths to the program encoders 22 and 24.
Functionally, the statistical re-multiplexer 21 does not control the bit rate output of each of program encoders 22 and 24. Although closed-loop statistical re-multiplexing can be highly efficient in bandwidth sharing among multiple encoded video/audio programs, it is not well suited for bandwidth sharing of multiple program encoders distributed over a large geographic area, or when multiple program streams are encoded and stored at different times. Even if such a reverse signal path exists, it must have low delay, and the program encoders 22 and 24 must be able to receive and correctly interpret the messages. Correct interpretation of the messages is often prevented when program encoders 22 and 24 in different geographic locations are not produced by the same manufacturer and implement different signal interfaces. Thus, network devices transmitting multiple video bitstreams typically use the open-loop stat remux architecture 20.
Unfortunately, the open-loop stat remux architecture 20 relies on information solely contained in the pre-compressed bitstreams for re-encoding. This reliance poses some limitations. One limitation is that the stat remux 21 cannot obtain information on the video signals within each compressed bitstreams it receives without completely decoding the signal back to the spatial domain (baseband). When the stat remux 21 is configured within a network device such as a router or headend, this complete decoding increases complexity of the network device, slows transmission of the video data, and decreases transmission efficiency. Any of these may diminish end-user video quality. Accordingly, it would be beneficial if information regarding the underlying video signals and associated picture could be ascertained from the compressed bitstreams without decoding the signals.
International standards have been created for various video compression schemes. These include MPEG-1, MPEG-2, MPEG-4, H.261, H.262, H.263, H.263+, etc. These standardized compression schemes rely on several algorithm schemes such as motion compensated transform coding (for example, DCT transforms or wavelet/sub-band transforms), quantization of the transform coefficients, and variable length coding (VLC). The motion compensated encoding removes the temporally redundant information in video sequences. The transform coding enables orthogonal spatial frequency representation of spatial domain video data. Quantization of the transformed coefficients reduces the number of levels required to represent a given digitized video. The other factor contributing to the compression is the use of variable length coding (VLC) so that most frequently used symbols are represented by the shortest code word. In general, the number of bits used to represent a given image determines the quality of the decoded picture. The more bits used to represent a given image, the better the image quality. The system that is used to compress digitized video sequence using the above-described schemes is called an encoder or encoding apparatus.
Commonly, transmission of video data is intended for real-time playback. This implies that all of the information required to represent a digital picture must be delivered to the destination in time for decoding and display in a timely manner. The channel must be capable of making such a delivery. However, a channel imposes a bit rate constraint for data being sent through the channel. This bit rate constraint often falls below the bit rate required to transport the compressed video bitstream. Thus, there is often a need to scale the transmission bandwidth required for the video data in order to fit within the available bandwidth of a network connection, or channel. This is often accomplished through compression through a compression scheme such as MPEG.
1. MPEG Packet Structure
FIG. 2 illustrates a compressed bitstream 60 having an MPEG-2 format. The MPEG-2 compression standard consists of two layers: a system layer 61 an elementary stream layer 62. The system layer 61 comprises two sub layers: a packetized elementary stream (PES) layer 64 and a packet layer 65 above the PES layer 64. The packet layer 65 can be either a transport stream 66 or a program stream 68. For compressed video and audio data, the data follows a hierarchical structure, namely, the elementary stream is contained in the PES payload, and the PES packets are contained in the packet layer payload.
The elementary stream layer 62 typically contains the coded video and audio data. It also defines how compressed video (or audio) data are sampled, motion compensated (for video), transform coded, quantized and represented by different variable length coding (VLC) tables. The basic structure for a coded video picture data is a block that is an 8 pixel by 8 pixel array. Multiple blocks form a macroblock, which in turn forms part of a slice. A coded picture consists of multiple slices. Multiple coded pictures form a group of pictures. Such hierarchical layering of data structures localizes the most basic processing on the lowest layer, namely blocks and macroblocks.
Each block contains variable length codes (VLC) for DCT coefficients. In the MPEG-2 syntax, the picture data section contains the bulk of the compressed video images. This is where the DCT coefficients are encoded as VLCs. For a typical bitstream, this portion of the data takes somewhere between 70%–90% of the total bit usage of a coded picture, depending on the coded bit rate.
The next layer is the system layer 61. The system layer 61 is defined to allow an MPEG-2 decoder to correctly decode audio and video data, and present the decoded result to the video screen in time continuous manner. The system layer 61 consists of two sublayers. The first sublayer in the system layer 61 is the PES layer 64. The PES layer 64 defines how the elementary stream layer is encapsulated into variable length packets called PES packets. In addition, the PES layer 64 may include presentation and decoding timestamps for the PES packets, which are used by a decoder to determine the timing to decode and display the video images from the decoding buffers.
The transport layer 65 defines how the PES packets are further packetized into fixed sized transport packets, e.g. packets of 188 bytes to produce a transport stream. Additional timing information and multiplexing information may be added by the transport layer 65. The transport stream 66 is optimized for use in environments where errors are likely such as transmission in a lossy or noisy media. Applications using the transport stream 66 include Direct Broadcast Service (DBS), digital or wireless cable services, broadband transmission systems, etc. The program stream 68 defines how the PES packets are encapsulated into variable sized packets and may also include additional timing in multiplexing information. The program stream 68 is designated for use in relatively error free environments and is suitable for applications that may involve software processing of system information such as interactive multimedia applications. Applications of program stream 68 include Digital Versatile Disks (DVD) and video servers.
Video data can be contained in the elementary stream (ES), which means that no PES, transport or program system layer information is added to the bitstream. The video data can also be contained in the PES stream 64, transport stream 66 or program stream 68. For a given video bitstream, the difference between these different layers lies in the timing information, multiplexing information and other information not directly related to the re-encoding process. In one embodiment, the information required to perform re-encoding is contained in the elementary stream layer. However, the present invention is not limited to bitstreams in the elementary stream layer. In other words, the present invention can be extended to the PES stream, transport stream or program stream as one of skill in the art will appreciate.
FIG. 3 is a diagram illustrating an MPEG elementary video bitstream. The MPEG elementary video bitstream 340 includes start code indicating processing parameters for the bitstream 340 such as a sequence start code 342, a sequence extension including a user data header 343, a Group of Pictures (GOP) header 344, a user data header 345, a picture header 346, and a picture coding extension that includes a user data extension 347. Picture data 348 follows the picture header 346. The bitstream 340 includes a second picture header 350 preceding picture data 352.
2. MPEG Compression Stages
Statistical remultiplexing includes recoding. Recoding is decoding followed by subsequent encoding (usually with a change of some sort). FIG. 4 illustrates an exemplary block diagram of the processing required for complete recoding 160. Recoding in this case begins by receiving compressed video data (161). The video data is then decoded comprising variable length decoding 162, de-quantization 164, inverse transform coding 166 and motion compensation 168. Variable length coding (VLC) allows the most frequently used symbols to be represented by the shortest code word. Quantization of the transformed coefficients reduces the number of levels required to represent a given digitized video. Quantization has a direct effect on the compressed bit usage and decoded video quality. The transform coding (DCT) enables orthogonal spatial frequency representation of spatial domain video data. Motion compensation 166 includes an iterative process where I, P and B frames are reconstructed using a framestore memory 170.
Recoding includes encoding by processing the video data with transform coding 172, re-quantization 174, and VLC encoding 176. After transform coding 172 and re-quantization 174, each image is decoded comprising de-quantization 178 and inverse transform coding 180 before motion compensation 182 with motion vectors provided by motion estimation 186. Motion estimation 186 is applied to generate motion vectors on a frame by frame basis. More particularly, a motion vector indicates an amount of movement of a macro block in an X or Y direction. Motion compensation 182 includes an iterative process where I, P and B frames are reconstructed using a framestore memory 184. Motion compensation 182 produces a predicted picture that is summed 186 with the next decoded picture 188 and encoded by transform coding 172, re-quantization 174, and VLC encoding 176. This iterative process of motion compensation 182 including generation of motion vectors by motion estimation 186 produces compressed video data 190 having a lower bit rate than received (161).
For many compressed video bitstream schemes, it is possible to change the bit rate of a bitstream by changing the quantization step value. This approach is called re-quantization. For bit rate reduction of the video data, re-quantization 174 is performed with a larger quantization step value. The re-quantized compressed video data 190 may then be combined with other re-quantized compressed video data and transmitted onto a channel. The re-quantization scheme 160 is advantageous if the resolution of the video data is also to be changed, e.g., to further reduce the bit rate. As the amount of data in the compressed bit stream is reduced, the bit rate that is required for transmitting the low resolution bit stream is also reduced.
3. MPEG Frame Types
Information in a compressed bit stream also indicates the relationship between various frames within a picture. The access unit level information relates to coded pictures and may specify whether a picture is an intra frame (I frame), a predicted frame (P frame), or a bi-directional frame (B frame). An I frame contains full picture information. A P frame is constructed using a past I frame or P frame. A bi-directional frame (B frame) is bi-directionally constructed using both a past and a future I or P frame, which are also called anchor frames.
FIG. 5 illustrates an exemplary frame sequence 300 included in a compressed bitstream. The sequence 300 corresponds to a group of pictures in an MPEG-2 bitstream. The sequence 300 includes an initial I frame 302, P frames 304a–d and ten B frames 306a–j. The I frame 302 contains full picture information. The P and B frames are constructed from other frames as illustrated by arrows 308. Each P frame 304a–c is constructed using the I frame 302 or a previous P frame 304a–c, whichever immediately precedes the P frame (e.g., the P frame 304b uses the P frame 304a). The B frames 306a–j are bi-directionally constructed using the nearest past and future reference picture. A reference picture is either an I or a P picture. For example, the B frames 306a and 306b are constructed using the past I frame 302 and future P frame 304a. It would be desirable if such access unit level information could be leveraged during the statistical remultiplexing process.
Based on the foregoing, improved methods and systems for applying information obtained from compressed video signals to a statistical remultiplexer would be desirable.