Digital video consumes large amounts of storage and transmission capacity. A typical raw digital video sequence includes 15 or 30 frames per second. Each frame can include tens or hundreds of thousands of pixels (also called pels), where each pixel represents a tiny element of the picture. In raw form, a computer commonly represents a pixel as a set of three samples totaling 24 bits. For instance, a pixel may include an eight-bit luminance sample (also called a luma sample, as the terms “luminance” and “luma” are used interchangeably herein) that defines the grayscale component of the pixel and two eight-bit chrominance samples (also called chroma samples, as the terms “chrominance” and “chroma” are used interchangeably herein) that define the color component of the pixel. Pixels of greater color depth can be represented by three samples totaling 48 bits or more. Thus, the number of bits per second, or bit rate, of a typical raw digital video sequence may be 5 million bits per second or more.
Many computers and computer networks lack the resources to process raw digital video. For this reason, engineers use compression (also called coding or encoding) to reduce the bit rate of digital video. Compression decreases the cost of storing and transmitting video by converting the video into a lower bit rate form. Decompression (also called decoding) reconstructs a version of the original video from the compressed form. A “codec” is an encoder/decoder system. Compression can be lossless, in which the quality of the video does not suffer, but decreases in bit rate are limited by the inherent amount of variability (sometimes called entropy) of the video data. Or, compression can be lossy, in which the quality of the video suffers, but achievable decreases in bit rate are more dramatic. Lossy compression is often used in conjunction with lossless compression—the lossy compression establishes an approximation of information, and the lossless compression is applied to represent the approximation.
In general, video compression techniques include “intra-picture” compression and “inter-picture” compression, where a picture is, for example, a progressively scanned video frame, an interlaced video frame (having alternating lines for video fields), or an interlaced video field. Intra-picture compression techniques compress individual pictures (typically called I-pictures or key pictures), and inter-picture compression techniques compress pictures (typically called predicted pictures, P-pictures, or B-pictures) with reference to one or more other pictures (typically called reference or anchor pictures).
A frame (or other video picture) is typically represented as one or more arrays of pixel samples. For example, a YUV video data frame is represented as three planes of samples: a luma (Y) plane of luma samples and two chroma (U, V) planes of chroma samples.
Often in inter-picture compression, motion compensation is used to exploit temporal redundancy between pictures. To exploit spatial redundancy in intra-picture compression, blocks of pixel or spatial domain video data are transformed into frequency domain (i.e., spectral) data. The resulting blocks of spectral coefficients may be quantized and entropy encoded. When the video is decompressed, a decoder typically performs the inverse of various compression operations (e.g., performs entropy decoding, inverse quantization, and an inverse transform) as well as motion compensation.
Numerous companies have produced video codecs. For example, Microsoft Corporation has produced a video encoder and decoder released for Windows Media Video 8. Aside from these products, numerous international standards specify aspects of video decoders and formats for compressed video information. These standards include the H.261, MPEG-1, H.262, H.263, MPEG-4, and JVT/AVC standards. Directly or by implication, these standards also specify certain encoder details, but other encoder details are not specified. These products and standards use (or support the use of) different combinations of compression and decompression techniques. In particular, these products and standards offer various techniques to trade-off quality and bit rate for video, including adjusting quantization, adjusting resolution (i.e., dimensions) of pictures, and frame dropping (i.e., temporal scalability).
While the compression techniques implemented in these products (or in compliance with these standards) are effective in many scenarios, it may be desirable to compress video data further than is allowable by a particular compression technique. For example, an upper limit on a quantization factor may limit the amount of compression that can be achieved by quantization. Moreover, in practice, extreme forms of quality degradation associated with particular techniques often limit how far video data may be effectively compressed with those techniques. For example, large quantization step sizes often result in visible distortions such as blocking or ringing in displayed images. Excessive frame dropping typically leads to choppy video on playback.
Microsoft Corporation has also produced a video encoder and decoder released for Windows Media Video 9 [“WMV9”]. In the WMV9 encoder and decoder, range reduction can provide for additional compression and/or help limit extreme forms of quality degradation for progressive video frames. The use of range reduction is signaled by a combination of sequence-layer and frame-layer bitstream elements.
A sequence header contains sequence-level parameters used in decoding a sequence of frames. In particular, the element PREPROC in the sequence-layer header is a one-bit element that indicates whether range reduction is used for the sequence. If PREPROC=0, range reduction is not used for any frame in the sequence. If PREPROC=1, there is a one-bit range reduction flag PREPROCFRM in the frame header for each progressive frame in the sequence. If PREPROCFRM=0 for a frame, range reduction is not used for the frame. If PREPROCFRM=1, range reduction is used for the frame.
When PREPROCFRM signals that range reduction is used for a frame, then the decoder scales up the reconstructed frame prior to display. The decoder also stores intact the actual reconstructed frame that has not been scaled up in value, for possible use in future motion compensation. A frame is represented using samples in the Y, U, and V planes within the range of 0 to 255 per sample. When, range reduction has been used for a frame, samples have been scaled down by a factor of two and mapped to a range of 64 to 191. The decoder thus scales up each of the samples in the Y, U, and V planes according to the following formulas:Y[n]=CLIP(((Yr[n]−128)<<1)+128)  (1),U[n]=CLIP(((Ur[n]−128)<<1)+128)  (2), andV[n]=CLIP(((Vr[n]−128)<<1)+128)  (3),where Yr[n], Ur[n], and Vr[n] represent the range-reduced values of the samples at different locations in the Y, U, and V planes, respectively. Y[n], U[n], and V[n] represent the scaled up values of the samples in the Y, U, and V planes, respectively. CLIP(n) equals 255 if n>255 and 0 if n<0; otherwise, CLIP(n) equals n. The operator <<x is a bitwise operator for a left shift by x bits with zero fill. The same scaling parameter (namely, a scaling factor of 2) is applied to the luma and chroma components of a frame.
When a reference frame is used for motion compensation, the decoder may scale it prior to using it for motion compensation. This is done when the current frame and the reference frame are operating at different ranges. More specifically, there are two cases that involve scaling the reference frame. First, if the current frame is range reduced but the reference frame is not, the decoder scales down the reference frame prior to motion compensation as follows:Yr[n]=((Y[n]−128)>>1)+128  (4),Ur[n]=((U[n]−128)>>1)+128  (5), andVr[n]=((V[n]−128)>>1)+128  (6),where the operator >>x is a bitwise operator for shift right by x bits with sign extension. Second, if the current frame is not range reduced but the reference frame is, the decoder scales up the reference frame prior to motion compensation, per equations (1) to (3).
While the preceding discussion focuses on syntax and decoder-side processing, the encoder-side processing is similar. An encoder scales down samples of a frame when range reduction is used for the frame, per equations (4) to (6). When a reference frame is used for motion compensation, the encoder scales it prior to motion compensation as necessary, as described for the decoder. The encoder signals the bitstream elements described above to regulate the use of range reduction.
While range reduction in WMV9 is effective in many cases, there is room for improvement for certain scenarios and types of content. The range reduction only allows scaling by a factor of 2 relative to the original sample depth—scaling by other factors or a choice of factors is not supported. This limits the granularity with which bit rate and quality can be regulated with the range reduction. Moreover, the range reduction is either (a) used for both luma and chroma samples or (b) not used at all. Performing range reduction for just luma (but not chroma) or just chroma (but not luma) is not supported, which again limits the usefulness of the range reduction in many scenarios. The range reduction uses per frame signaling, which can be inefficient in terms of bit rate. Finally, the range reduction at times involves scaling operations within the motion compensation loop, requiring additional encoder-side and decoder-side processing of reference frames.
Given the critical importance of compression and decompression to digital video, it is not surprising that video compression and decompression are richly developed fields. Whatever the benefits of previous video compression and decompression techniques, however, they do not have the advantages of the following techniques and tools.