1. Field of the Invention
The present invention relates to the compression of video data. More specifically, the invention provides a system and method of encoding transformed video data to provide efficient compression.
2. Description of the Related Art
There is an increasing call for digital media representations of video and audio data to be presented on personal computers or other forms of end user terminals. Frequently, due to the size of the representations, digital media representations are stored at a remote location and are accessed remotely by personal computers over a computer network such as the Internet. In addition, much of the media is stored in files so that it can be later transmitted in a streaming format. In order to reduce storage required for large media representations, these media files are typically compressed at the storage location before transmission and then decompressed by the local personal computer so that the file may be presented. By compressing the representation, less storage space is required and the representation is more easily transmitted across networks of limited bandwidth. This efficiency of transmission also allows for smoother and more detailed presentation of real-time video and audio to better satisfy viewers' expectations.
To reduce storage and transmission requirements for digital audio and video, a number of encoding standards have been developed, which are well known in the art. Existing digital video compression encoding standards use a number of common encoding techniques, including transform encoding (one example of which is the discrete cosine transform, or DCT), quantization, and entropy encoding (e.g., Huffman coding, run length coding, and arithmetic coding) among others.
Well known references discussing these techniques include: K. R. Rao and J. J. Hwang, Techniques & Standards for Image, Video, & Audio Coding, Prentice Hall 1996; K. R. Rao and R. Yip, Discrete Cosine Transform: Algorithms, Advantages, Applications, Academic Press, Inc. 1990; A. Gersho, R. Gray, Vector Quantization and Signal Compression, Kluwer Academic Publishers 1992; T. Bell, et. al., Text Compression, Prentice Hall 1990.
In encoding digital video, there are limitations on the amount of data that can be efficiently transmitted. Frequently, to transmit as many images as possible within a fixed bandwidth, it is necessary to cut back on the amount of data transmitted for each image. Currently, in many video compression systems, the data is pared down after the images are transformed, but before the final encoding is performed.
In particular, many video encoding systems' embody a “lossy” design where data is lost after the point that the video data has been transformed using DCT. One technique involves taking the matrix of coefficients that results from the DCT and then creating an estimate of the values in that matrix. This estimate is what is then transmitted. The data loss occurs because the estimate will necessarily lose some information when it is created. One version of this estimating procedure involves selectively dropping values that fall below a certain threshold. Another technique involves reading the values of the matrix in a “zig-zag” order that starts at the upper-left hand of the matrix and spreads out from there. This is done to take advantage of the fact that the expected magnitude of the coefficients typically drops exponentially as they are visited by the zig-zag scan. Because it is assumed that the values along the further reaches of the scan will be very close to zero, the scan may be stopped before it visits every coefficient, and only the scanned values are transmitted. To further reduce the size of these image data, the values along the scan may be estimated through the use of an exponential function. This technique requires only the transmission of the parameters of the function itself. This removal of data necessarily involves the loss of some video information. This loss can result in a loss of detail in the presented video, sometimes to the degree that it is noticeable even to the casual viewer.
Another technique used by some video coding systems is the use of probability information to determine the nature of the encoding scheme for the coefficients that are transmitted. A number of entropy encoding systems (including Huffman coding, arithmetic coding, and Shannon-Fano coding) utilize knowledge of the probabilities of each coefficient to create an efficient coding. In essence, the more probable a particular coefficient is, the smaller its encoded form is so that the set of coefficients may be transmitted using as few bits as possible. Some techniques encode based on single coefficients, while others scan as a zig-zag, encoding non-zero coefficients along with the number of zero-value coefficients that immediately follow. It is useful to consider multiple coefficients when computing probabilities, because the nature of DCT creates matrices that sometimes demonstrate correlation between coefficient values. If dependencies between coefficients are considered when encoding, certain combinations can be found to be more probable, enabling greater entropy efficiency. While current techniques do have the benefit of somewhat more efficient encoding of a set of coefficients, they take very limited advantage of the above-mentioned correlations by only considering coefficients that are adjacent on the zig-zag scan, and even then are typically limited to at the most considering dependencies between the values of non-zero coefficients and the numbers of zero coefficients that immediately follow. This does little in the way of exploiting coefficient dependencies.