Digital video is pervasive in cable broadcasts, satellite television, Internet videos, and the like. Typically, video data is transmitted in frames, and each frame is represented as an array of pixel values. Generally, the pixel values are each divided into three, eight-bit values representing three color planes (usually red, green, and blue) comprising the components of the pixel displayed at that point. With the advent of high-definition videos, typical frame resolutions are often as high as 1920×1080 pixels (and will likely become even greater in the future). If three, eight-bit values are used to represent each pixel, a total of 49,776,400 bits of data are required to represent each frame in a high-definition video. Thus, due to the large volume of data associated with each video, transmitting uncompressed video data over conventional transmission media is highly impractical. Accordingly, systems and methods for compressing and decompressing video data are paramount to successful transmission and broadcasting of videos.
Generally, a “codec” is a device, processor, or computer program capable of encoding and/or decoding (i.e. compressing and/or decompressing) digital video data, files, streams, or signals. Video compression systems within traditional codecs transform the red, green, and blue (RGB) pixel values in each frame to a new color space, such as the luminance/chrominance color space (expressed in Y, U, and V components or Y, I, and Q components), in which the luminance (i.e. black and white) information is stored in the Y component, and the color information is stored in the remaining components. This form of transformation (or “decorrelation”) provides some reduction in the overall number of bits for each frame (typically on the order of about half), but it adds additional computation at the decompressor or receiving end of a transmission, as the pixels in the decorrelated color space must be transformed back to their native RGB color space. Further, as resolutions increase (such as with high-definition video), the complexity and cost of decompression hardware increases, and thus becomes a significant issue for content distributors.
Also contributing to the overall cost and complexity of decompression is the “store and forward” nature of most video codecs, in which a video is compressed once, but decompressed numerous times. Examples of such systems are “video on demand” systems, in which digital videos are rented or purchased by end users for viewing. In these applications, almost unlimited computational power can be implemented on the compression side, as the cost of compression can be spread across many decompressions. As frame resolution increases, however, the decompression hardware, which may be a set-top box, general purpose computer, or even a handheld device, is often incapable of decompressing the content at speeds or reliabilities satisfactory to most end users.
To combat these issues, conventional video compressors, such as MPEG-2 and H.264, utilize motion vectors that describe how to rearrange data elements of a previous frame to generate a new frame representation. This is generally accomplished by dividing each frame into squares or “blocks” of a fixed size and determining the motion vectors that define how the blocks of the previously-generated frame should be moved to best represent the current frame. Accordingly, the motion vectors represent the motion of blocks from one frame to the next. This method reduces the overall computational complexity of a system because the representation for each new frame is generated based on the representation of the previous frame, and thus a new data representation does not have to be generated from scratch for each new frame. Also, the utilization of motion vectors is highly “asymmetrical”—the compressor must decide amongst many alternatives (vectors), whereas the decompressor only has to apply the chosen vector. Thus, complexity on the decompression end is reduced.
Use of motion vectors generally functions best for video sequence data in which the content of a scene is relatively stable. Motion vectors are typically expressed in terms of the difference between the chosen motion vector and a prediction caused by the surrounding vectors. This difference is encoded using a predefined table of codes. Because the predictions are based on relatively minor movements of blocks between frames (i.e. blocks are chosen in isolation based on a locally-optimal choice), this conventional coding scheme does not work well for video sequences with high levels of content movement, as the system is unable to determine a good match in the subsequent, local block area for a given block. In addition, by choosing motion vectors based on a locally-optimal choice, scenes with uniform motion (such as a steady camera pan) will often distort the subsequent frames, as false correlation will occur between blocks.
Regardless of problems with motion vectors, once a motion-approximated screen has been defined, most conventional codecs encode the determined pixel differences via a discrete cosine transform (DCT). Generally, a DCT expresses a sequence of finitely many data points in terms of a sum of cosine functions oscillating at different frequencies. The DCT for most codecs is “symmetric” in the sense that the number of computations necessary to carry out the transform is basically equivalent to the number of computations necessary to compute the inverse. Thus, as screen resolutions increase (for high-definition and beyond), the computational burden of the DCT becomes significant.
However, many content providers and distributors provide both high-resolution and low-resolution content, and thus the associated codecs need to be equipped to handle both types of video. Conventional codecs, however, are ill-equipped to handle such content variations, as traditional compressors are tied to particular resolutions because the underlying transform is designed to transform blocks of a fixed size (usually 8×8 pixels—although other sizes are possible). The dependence on blocks of a fixed size limits the codec such that video compressed at a particular resolution cannot be decompressed at a different resolution with post-processing. Additionally, fixed block size compressors have difficulty compressing and encoding detail at a smaller scale. These compressors are also unable to take full advantage of potential efficiencies associated with large, slow-varying images in video frames.
Some systems have been developed in an attempt to overcome some of the above-referenced problems with conventional codecs. An example of such a system is described in detail in Hurd et al., U.S. Pat. No. 5,982,441, issued Nov. 9, 1999, and entitled “System and Method for Representing a Video Sequence,” which is incorporated herein by reference as if set forth herein in its entirety. However, these systems may not perform well for high-resolution videos, and also do not take advantage of efficiencies associated with current, improved computational processors.
Therefore, there is a continuing need for a codec that improves both the quality of the produced output and the speed of playback of a compressed/decompressed digital video. The needed codec preferably employs a compressor capable of: utilizing blocks of any size, varying motion vectors to prevent artifacts (ie., visual distortions) in regions of uniform motion, and adapting motion vectors to the given frame. There is a also a need for a compressor that makes decisions based on efficiency, allows description of a video bitstream in a resolution-independent fashion, and is capable of supporting a low-complexity decompressor. There is an additional need for a codec that employs a decompressor that is capable of operating solely in the RGB color space.