1. Field of the Invention
The present invention relates generally to systems and methods for processing compressed bitstreams of data. In particular, the present invention relates to a system and a method for multiplexing a plurality of channels for transmission over a single medium. Still more particularly, the present invention relates to a system and method for statistical re-multiplexing multiple channels.
2. Description of the Related Art
There are presently a variety of different communication channels for transmitting or transporting video data. For example, communication channels such as digital subscriber loop (DSL) access networks, ATM networks, satellite, or wireless digital transmission facilities are all well known. The present invention relates to such communication channels, and for the purposes of the present application a channel is defined broadly as a connection facility to convey properly formatted digital information from one point to another. A channel includes some or all of the following elements: 1) physical devices that generate and receive the signals (modulator/demodulator); 2) physical medium that carries the actual signals; 3) mathematical schemes used to encode and decode the signals; 4) proper communication protocols used to establish, maintain and manage the connection created by the channel. The concept of a channel includes but is not limited to physical channel, but also logical connections established on top of different network protocols, such as xDSL, ATM, wireless, HFC, coaxial cable, etc. Storage systems, such as magnetic tapes, optical disks, can also be considered as part of a channel, but the present invention is not discussed in this context.
The channel is used to transport a bitstream, or a continuous sequence of binary bits used to digitally represent compressed video, audio or data. The bit rate is the number of bits per second that the channel is able to transport. The bit error rate is the statistical ratio between the number of bits in error due to transmission and the total number of bits transmitted. The channel capacity (or channel bandwidth) is the maximum bit rate at which a given channel can convey digital information with a bit error rate no more than a given value. A video channel or video program refers to one or more compressed bit streams that are used to represent the video signal and the associated audio signals. Also included in the video channel are relevant timing, multiplexing and system information necessary for a decoder to decode and correctly present the decoded video and audio signals to the viewer in a time continuous and synchronous manner. There may be one or more video signals and one or more audio signals per channel. However, in all realistic cases, each video channel has one video bit stream, together with one or more compressed audio bit streams. A multiplex is a scheme used to combine bit stream representations of different signals, such as audio, video, or data, into a single bit stream representation. In contrast, re-multiplex is a scheme used to combine bit stream representations of different multiplexed signals into a single bit stream representation.
A digital video signal is a sequence of digitized images that are obtained from the source and displayed in the destination in a synchronized manner. Digitized video sequence, when left in its original digitized form and transmitted over digital communication channels, requires significant amount of channel bandwidth. The digital video compression techniques, such as MPEG-1/2/4 and H.26X, can be used to dramatically reduce the channel bandwidth required to transmit the signal. However, the compression technique also introduces significant computational complexity into both the encoding and decoding process. Specifically, the compressed video bit streams, at any given bit rate, cannot be altered again to a different bit rate without decoding and re-encoding. In addition, the resulting number of bits required to represent digital video pictures varies from picture to picture and the coded pictures are highly correlated via motion estimation. The problem of delivering real-time digital video bit stream over a channel of a given bandwidth becomes a problem of matching the available bandwidth to the coded video bit stream rate. When the mismatch occurs, re-encoding, or re-compression, must be done.
Digital Video Compression
Digital video compression is the two-dimensional signal processing that allows digitized video frames to be represented digitally in a much more efficient manner. Compression of digital video makes it practical to transmit the compressed signal by digital channels at a fraction of the bandwidth required to transmit the original signal without compression. International standards have been created on video compression, schemes. These include MPEG-1, MPEG-2, H.261, H.262, H.263, etc. These standardized compression schemes mostly rely on several key algorithm schemes as shown in FIG. 1 including: motion compensated encoding, transform coding (DCT transforms or wavelet/sub-band transforms), quantization of the resulting coefficients, and variable length encoding. The motion compensated encoding 10 removes the temporally redundant information inherent in video sequences. The transform coding 12 enables orthogonal spatial frequency representation of spatial domain representation of the video sequence. Quantization 14 of the transformed coefficients reduces the number of levels required to represent a given digitized video sample and is the major factor in bit usage reduction in the compression process. The other factor contributing to the compression is the use of variable length coding (VLC) 16 so that most frequently used symbols are represented by the shortest code word. In general, the number of bits used to represent a given image determines the quality of the decoded picture. The more bits used to represent a given image, the better the image quality. The hardware or software system that compresses digitized video sequence using the above described bit stream schemes is called an encoder or encoding system. In these compression schemes, the quantization scheme is a lossy, or irreversible process. Specifically it results in loss of video textural information that cannot be recovered by further processing at a later stage. In addition, the quantization process has a direct effect on the resulting bit usage and decoded video quality of the compressed bit stream. The schemes in which the quantization parameters are adjusted control the resulting bit rate of the compressed bit stream. The resulting bit stream can have either constant bit rate (CBR) or variable bit rate (VBR). A CBR compressed bit stream can be transmitted over a channel that requires the input bit rate to the channel to be constant over time. Compressed video bit streams are generally intended for real-time decoded playback at a different time or location. The decoded real-time playback must be done at 30 frames per second for NTSC standard video and 25 frames per second for PAL standard video. Thus, all of the information required to represent a digital picture must be delivered to the destination in time for decoding and display in a timely manner. Therefore, this requires that the channel must be capable of making such delivery.
From a different perspective, the transmission channel imposes a bit rate constraint on the compressed bit stream. In general, the prior art adjusts the quantization in the encoding process so that the resulting bit rate can be accepted by the transmission channel. Because both temporal and spatial redundancies are removed by the compression schemes and because of variable length encoding, the resulting bit stream is very sensitive to bit errors or bit losses in the transmission process compared with transmission of uncompressed video data. In other words, minor bit error or loss of data in compressed bit stream typically results in major loss of video quality or in a complete shutdown of operation of the digital receiver/decoder. Furthermore, real-time multimedia bit streams are highly sensitive to delays. A compressed video bit stream, when transmitted under excessive and jittery delays, causes the real-time decoder buffer to under flow or overflow, causing the decoded video sequence to be jerky, or causing a loss of synchronization between the audio and video signals. Another consequence of the real-time nature of compressed video decoding is that lost compressed data will not be re-transmitted. Because of this sensitivity of compressed bit streams, there is a reluctance to change, modify or re-encode compressed bit streams in the prior art.
Re-Encoding
Re-encoding is the process of performing decoding on an input compressed bit stream and then encoding back to a compressed bit stream. The prior art includes many ways to apply rate conversion, or re-encoding, to one or multiple compressed bit streams. FIG. 2 shows a block diagram of a prior art system for transmitting video data over a communication channel showing the encoding and decoding function in more detail. In particular, as shown, the encoding includes receiving raw video data and processing the raw video data with motion estimation 50, transform coding 52, quantization 54, and VLC encoding 54 to produce a compressed bit stream. The compressed bit stream can then, because of its reduced size, be transmitted over any one of a variety of prior art transportation systems 58. The decoding process is then applied to the compressed bit stream received from the transportation system 58 to obtain the original raw video images. The decoding includes VLC decoding 60, De-quantization 62, inverse transform coding 64, and motion compensation 66, all in a conventional manner.
For the purpose of rate conversion in the compressed domain, some exemplary prior art procedures are shown in FIG. 3. For the present invention, re-encoding is defined in its broadest sense to include partial decoding, recoding, re-quantization, re-transforming, and complete decoding and recoding. Referring now to FIG. 3, each of these types of re-encoding are defined with more particularity. Some of the elements shown may also be needed for decoding and encoding of the video data. Hence in actual implementation, these common elements may be shared between the re-encoder 300 and the decoder/encoder. Partial decoding refers to path E where the bit stream is partially decodes system syntax, and video syntax down to the picture header to perform frame accurate flexible splicing. Re-coding refers to path D where variable length encoding and decoding are performed and the DCT coefficients may be truncated to zero without even going through the inverse quantization steps. This approach requires the least processing, but in general causes the greatest amount of quality degradation. Re-quantization refers to path C where variable length encoding, de-quantization, quantization and decoding are performed but no transform coding is used. The transform coefficients (DCT coefficients) are requantized before being VLC encoded back. Re-transformation refers to path where variable length decoding, de-quantization, inverse transformation, forward transform coding, quantization and encoding are performed. The video frames are constructed without using motion compensation. In the case of B or P pictures, this would mean some of the coded blocks are motion estimated residual errors. Some form of spatial filtering may be used before forward transform coding is used in the encoding process. Recoding refers to path A where the bit streams are complete decoded to raw video and then encoded including the use of motion estimation and compensation. Each of the paths A, B, C, D, E includes a rate converter for adjusting the rate of the bit stream to ensure buffer compliance. Each of the rate converters may be different. For example, the rate converter on path A may be a spatial filter and the rate converter on path C may perform a quantization step size adjustment while the rate converter on path D performs high frequency elimination. Those skilled in the art will also recognize that the components of the re-encoder 300 used (e.g., the path through the re-encoder 300) could also be variably controlled to provide variable bit rate conversion using the re-encoder 300. In various embodiments, the re-encoder 408 may include all, only some or any combination of these components according to which re-encoding, re-quantization, re-transforming and re-coding may be performed.
Generally, motion estimation and compensation is the most computationally expensive; transform coding and inverse transform coding are also quite expensive. In general, without special hardware to perform these functions, motion estimation and compensation will take over 80%-90% of the overall decode-encode computation load. The key to a simplified rate conversion scheme is therefore to bypass some of these expensive steps. For example, in FIG. 3, if we take the path B, motion estimation and compensation is avoided. If we take path C, both motion estimation and compensation and transform coding are eliminated. If we take path D, quantization steps are also eliminated, in addition to motion estimation and compensation and transform coding. Of course, path A performs the entire decoding and encoding process, resulting in the most flexibility and potentially the best quality, and is computationally the most expensive.
MPEG-2 Bit Stream Syntax
Those methods mentioned above can be applied to MPEG-2 program streams, MPEG-1 streams or other video conferencing compression standards. Thus, while the present invention can be applied to any of the various compression technique and is not limited, it will be discussed in the present application in the context of MPEG-2 by way of example. This section provides brief overview of the MPEG-2 bit stream syntax for better understanding the concepts in the invention.
MPEG-2 compression standard consists of two layers of information. Their relationship can be illustrated via FIG. 4. The bottom layer is the elementary stream (ES) layer. This layer defines how compressed video (or audio) signals are sampled, motion compensated, transform coded, quantized, and represented by different variable length coding (VLC) tables. The re-encoding of a pre-compressed MPEG-2 bit stream is a process in which the bit stream signal is redefined in this layer.
The next layer is the system layer. The system layer is defined to allow the MPEG-2 decoder to correctly decode audio and video signals and present the decoded result to the video screen in a time continuous manner. The system layer also includes provisions that allow unambiguous multiplexing and separation of audio and video compressed signals, and different channels of audio and video compressed signals. The system layer consists of two sublayers. The first layer is the PES layer; this layer defines how the ES layer bit stream is encapsulated into variable length packets, called PES packets. In addition, presentation and decoding time stamps are added to the PES packets. There are two different sub-layers above the PES layer, the transport layer and program system layer.
The transport layer defines how the PES packets are further packetized into fixed sized transport packet of 188 bytes. Additional timing information and multiplexing information are added to the transport layer. The resulting stream of transport packets is called transport stream. Transport stream is optimized for use in environments where errors are likely, such as storage or transmission in lossy or noisy media. Typical applications of transport stream include Direct Broadcast Service (DBS), digital or wireless cable services, broadband transmission systems, etc.
The program system layer defines how the PES packets are encapsulated into variable size packets. Additional timing and multiplexing information are added to the program system layer. The program stream is designed for use in relatively error-free environments and is suitable for applications that may involve software processing of system information such as interactive multimedia applications. Typical applications of program stream include Digital Versatile Disks (DVD) and video servers.
In general a video bit stream can be in elementary stream (ES) format, which means that no PES, transport or program system layer information is added to the bit stream. The video bit stream can also be represented in the form of PES stream, transport stream or program stream. For a given video bit stream, the difference between these different bit streams represented in the different layers lies in the timing information, multiplexing information and other information not directly related to the re-encoding process. The information required to perform re-encoding, however, is contained in the elementary stream layer. The ensuing discussion on re-encoding is, therefore, not limited to bit streams in any one of the layers. In other words, the discussion on how to re-encode bit streams in one layer, say in elementary stream layer, can be straightforwardly extended to PES stream, transport stream or program streams as well.
With the above background, the system and method for multiple channel statistical re-multiplexing will now be discussed.