The present invention relates to methods of and systems for processing video during compression, specifically MPEG-4 video compression, wherein enhancement layers are added to the base layer using activity-based frequency weighting methods in an adaptive procedure, to allow a more visually-sensitive component of a video frame to be encoded in a high priority of number of bit-planes, and thus to provide high visual quality at decompression time
xe2x80x9cMPEGxe2x80x9d generally represents an evolving set of standards for video and audio compression developed by the Moving Picture Experts Group. The need for compression of motion video for digital transmission becomes apparent with even a cursory look at uncompressed-bitrates in contrast with bandwidths available. MPEG-1 was designed for coding progressive video at a transmission rate of about 1.5 million bits per second. It was designed specifically for Video-CD and CD-i media. MPEG-2 was designed for coding interlaced images at transmission rates above 4 million bits per second. The MPEG-2 standard is used for various applications, such as digital television (DTV) broadcasts, digital versatile disk (DVD) technology, and video storage systems. MPEG-4 is designed for very low-bit rate applications, using a more flexible coding standard to target internet video transmission and the wireless communications market.
The MPEG4 video compression standard allows content-based access or transmission of an arbitrarily-shaped video object plane (VOP) at various temporal and spatial resolutions. MPEG4 supports both object and quality scalability. The fine granularity scalability (xe2x80x9cFGSxe2x80x9d) is one type of scalable coding scheme that is adopted by the MPEG4 standard. The FGS encoding scheme allows an MPEG4 bitstream to be encoded in two layers: the base layer, which encodes each frame with a fixed lower bound bit-rate; and the enhancement layer, which encodes the difference between original picture and the reconstructed base-layer picture. The enhancement layer is encoded via a bitplane coding scheme, therefore enhancement layer bitstrearms are scalable in the sense that an arbitrary (fine grained) number of bit-planes of the enhancement-layer can be transmitted to the decoder depending on the transmission bandwidth. The FGS coding scheme has been finalized by MPEG4 version 4.
In the standardized FGS scheme, frequency weighting is a feature used for visual quality improvement. By giving different weights to the elements of each coding block, the enhancement layer residuals are weighted and encoded relative to their importance to the visual output quality.
The MPEG-4 decoder may decode only the base layer or the base layer and any subset of the FGS enhancement layer. This is useful when the decoding device is of limited or variant bandwidth and for storage purposes.
In some cases, the base layer alone is decoded, allowing for a less-detailed video image to be viewed. When the bandwidth between encoding and decoding is variant, or when the space for bitstream storage is limited, the base layer is decoded and as much of the FGS enhancement layer can be added on top of the base layer as bandwidth or storage space allows.
The MPEG-4 standard operates by first encoding a base layer of the scene being compressed. This base layer is a lower-quality, low-bandwidth, compressed image. The base layer is represented by a plurality of coding blocks, e.g., the discrete cosine transform-encoded (xe2x80x9cDCTxe2x80x9d) blocks. The FGS enhancement layer is represented by a plurality of residual blocks. Next, the FGS enhancement layer generates a bitstream in addition to the base layer bitstream. Depending on the bandwidth of the transmission channel and complexity of the decoder, a truncated bitstream of the FGS layer will necessarily be decoded.
In the MPEG-4 coding standard, two quality improvement methods are standardized for FGS enhancement layer encoding. These two methods are frequency weighting and selective enhancement. Herein, only the frequency weighting method is addressed.
The FGS enhancement layer is used to code the quantization residuals from the base layer, therefore the overall quality of the coded sequence is the.combination of the base layer information and the transmitted FGS enhancement layer information. In theory, the FGS method codes the residuals of the base layer without loss. However, it is often the case that only part of the FGS enhancement layer can go through the transmission channel and arrive at the decoder, due to a limited transmission bandwidth. When bandwidth variation occurs, the number of bits of the FGS enhancement layer transmitted from the encoder side to the decoder side are variant depending on the bandwidth at the moment of transmittal. Also, due to the nature of FGS enhancement layer coding, it can be placed in storage to any desired quality level. Hence, the visual quality of the transmitted/stored signal is heavily impacted by the amount of the FGS layer that is decoded.
To improve the visual quality of the output sequence, frequency weighting allows the weighting of the elements of the residual block unevenly before the bitplane coding (which is the method used for the FGS layer coding). Since certain frequency components are visually more important, they should be enhanced more (i.e. they should be coded with high accuracy by being given high frequency weights), thereby improving the subjective image quality.
Objects being encoded by bit-plane encoding are ordered from most-significant bit (xe2x80x9cMSBxe2x80x9d) to least significant bit (xe2x80x9cLSBxe2x80x9d). A Bit-plane shift describes the operation of shifting the bitplanes corresponding to a particular value in a block by one or more bits towards the MSB. This has the effect of increasing, or boosting, the priority of the objects encoded, in this case the residual block.
When the base-layer coefficients are encoded or xe2x80x9cquantized, xe2x80x9d the quantization function has an associated loss. Thus, the accuracy of the quantized data depends on the quantization steps. Quantization residuals are left out as a non-encoded part for the base-layer and not recoverable at the base layer of the decoder side.
Fine granularity refers to a coding method where the video data is encoded in a progressive way (bit-plane by bit-plane), from MSB to LSB. Consequently the encoded bitstream can be truncated at any bit-plane level, while always ensuring the more significant data is more likely to be sent.
Frequency weighting (xe2x80x9cFWxe2x80x9d) uses a FW matrix to selectively re-weight the importance of each enhancement layer coefficient within each coding block, so that the significance of each coefficient for bitplane encoding is re-prioritized by the weighting matrix. Each element of the FW matrix indicates the number of bit-plane shifts of the corresponding FGS coefficient within the block. A bit-plane shift of one is equivalent to the multiplication of the FGS coefficient by a power of two. While MPEG-4 does standardize the FGS tool, it does not provide an appropriate FW matrix. The FW matrix definition is left as an encoder optimization parameter to be set by each manufacturer individually.
Using a DCT based codec as an example, for an 8xc3x978 DCT block, the DC coefficient and the lower frequency components usually contribute more to the visual quality. Thus, the lower frequency components and the DC coefficient should be encoded with high priority. However, the FGS codec is designed in such a way that the enhancement layer encodes the residuals bitplane by bitplane with regards to the amplitude of the residual only, rather than the importance of the frequency components. On the other hand, the base layer coding which codes the DC and lower-frequency components with a higher accuracy by using smaller quantization parameters will result in smaller residuals for the enhancement layer. Consequently, in contrast to the base, layer DCT coefficients characteristics, the important DC and lower frequency components may have smaller values in the enhancement layer, and will not be encoded by FGS in a more significant bitplane. When the targeted number of transmitted bitplane is low, the important frequency components may be lost due to bitstream truncation. To prevent this, the more important coefficients should be encoded in a higher bitplane with higher priority. This can be achieved through giving higher weights at that frequency location in the FW matrix. The FW matrix is designed to lift up the more important frequency components to a higher bitplane.
One problem with current FW implementations is that the FW method is conducted in such a way that the whole sequence uses the same weighting matrix. As observed from tested sequences, each sequence may have multiple scenes, which may contain different motion activities and brightness information. In slow motion or tranquil scenes, high frequency loss becomes more annoying. Moreover, blockiness and flickering noise are more annoying on brighter pictures. Pictures with more motion activities tend to have bigger residuals in the enhancement layer, especially for the higher frequency part. This is because of motion prediction errors. For a picture containing more detailed information, high frequency residuals are too significant to be ignored.
The present invention provides methods of and systems for addressing the needs of the prior art. These methods and systems provide the ability to determine the FW matrix that will provide the best image quality during encoding, and to adapt the weighting matrix to regard a change of the scene characteristics, thereby optimizing the resulting output picture quality, especially in bandwidth-deprived applications.
To address the problem of using a single fixed FW matrix for each sequence in the prior art, the FW matrix is designed to be changed during encoding in accordance with the change of scene characteristics as explained below.
The present invention, which addresses the needs of the prior art, provides in an embodiment, a method of processing a video stream containing one or more of video frames, in which the video stream is encoded by creating a base layer for each frame, including a plurality of encoded blocks, and adding an enhanced layer, where the quantization residuals of the base layer forms a residual block to be further encoded to increase the fine granularity.
In this method a plurality of frequency weighting matrix are defined, each of which specifies the number of bit-plane shifts to apply to the coefficients of the residual blocks, in which one or more of the matrix specifies a high weight and high width. Weight is related to the number of bitplane shifts, while width is a range from the top left corner of the frequency weighting matrix to the last non-zero weights of the frequency weighting matrix along a zigzag line. An additional one or more said matrix specifies a higher weight and medium width, one or more said matrix specifies low weight and low width, one or more said matrix specifies medium weight and high width, and one or more matrix specifies medium weight and medium width weights.
Next, the base layer and enhancement layer of the video frame are encoded. The enhancement layer is represented by a plurality of encoded residual blocks. The encoded residual blocks are frequency-weighted with the chosen frequency weighting matrix.
If the video frame contains a high amount of activity, a high weight and high width frequency weighting matrix (HH), an example of which is depicted in FIG. 3a, is chosen to be the frequency matrix used for bit-plane shifting.
Otherwise, if the video frame contains a high amount of motion, a high weight and medium width frequency weighting matrix (HM), an example of which is depicted in FIG. 3b, is chosen to be the frequency matrix used for bit-plane shifting.
Otherwise, if the video frame contains a low amount of motion and low amount of activity, allow weight, low width frequency weighting matrix (LL), an example of which is depicted in FIG. 3e, is chosen to be the frequency matrix used for bit-plane shifting.
Otherwise, if the video frame contains a low amount of brightness, a medium weight, high width frequency weighting matrix (MH), an example of which is depicted in FIG. 3c, is chosen to be the frequency matrix used for bit-plane shifting.
Otherwise, the medium height and medium width matrix (MM), an example of which is depicted in FIG. 3d, is used to determine the bit-plane shift to be applied to the blocks of the video frame.
The invention also relates to a system for processing a video stream, in which the video stream contains a plurality of video frames. This system includes a video signal source of the video stream, a processor operatively coupled to the video signal source, and an output for encoded video.
The processor is configured to define a plurality of frequency weighting matrix, each of which specifies the number of bit-plane shifts to apply to the coefficients of the residual blocks, in which one or more of the matrix specifies a high weight and high width, and one or more said matrix specifies a higher weight and medium width, one or more said matrix specifies low weight and low width, one or more said matrix specifies medium weight and high width, and one or more matrix specifies medium weight and medium weights. Next, the base layer and enhancement layer of the video frame are encoded. The residual encoded blocks of the enhancement layer are frequency-weighted with the frequency weighting matrix. If the vide frame contains a high amount of activity, a high weight and high width frequency weighting matrix is used to determine the bit-plane shift to be applied to the blocks of the video frame. Otherwise, if the video frame contains a high amount of motion, a high weight and medium width frequency weighting matrix is used to determine the bit-plane shift to be applied to the blocks of the video frame. Otherwise, if the video frame contains a low amount of motion and low amount of activity, a low weight and low width frequency weighting matrix is used to determine the bit-plane shift to be applied to the blocks of the video frame. Otherwise, if the video frame contains a low amount of brightness, a medium weight and high width frequency weighting matrix is used to determine the bit-plane shift to be applied to the blocks of the video frame. Otherwise, the medium height and medium width is used to determine the bit-plane shift to be applied to the blocks of the video frame.
Other improvements which the present invention provides over the prior art will be identified as a result of the following description which sets forth the preferred embodiments of the present invention. The description is not in any way intended to limit the scope of the present invention, but rather only to provide a working example of the present preferred embodiments. The scope of the present invention will be pointed out in the appended claims.