1. Field of the Invention
The present invention relates to a system and method of video compression coding and decoding and particularly to a system and method of using pattern vectors for video and image coding and decoding that eliminates two-dimensional coding of transform coefficients and the requisite zigzag scan order or alternate scan order.
2. Discussion of Related Art
Transform coding is the heart of several industry standards for image and video compression. Transform coding compresses image data by representing samples of an original signal with an equal number of transform coefficients. A sample of the original signal may be the signal itself, or it may be the difference between the signal and a predicted value of the signal, the prediction being done by any of a number of widely-known methods. Transform coding exploits the fact that for typical images a large amount of signal energy is concentrated in a small number of coefficients. Then only the coefficients with significant energy need to be coded. The discrete cosine transform (DCT) is adopted in standards such as the Joint Photographers Expert Group (JPEG) image coding standard, Motion Picture Expert Group (MPEG) video coding standards, ITU-T recommendations H.261 and H.263 for visual telephony, and many other commercially available compression systems based on some variations of these standard transform coding schemes.
Transform coding is a block-based image compression technique in which the input image is partitioned into fixed-size small blocks and each block of pixels is coded independently. FIG. 1, FIG. 2, and FIG. 3 illustrate a standard method of block-based image compression. As shown in FIG. 1, in a typical transform encoder, an input image (101) is partitioned into blocks (102). The blocks are usually square but may be of any rectangular shape, or in fact may be of any shape at all. FIG. 2 illustrates the data 202 in a block is transformed by a sequence of linear operations in the encoder into a set of quantized transform coefficients. A predictor 204 may predict the sample values in the block to yield a predicted block 206. Many such predictors are known in the art. A difference operator 208 computes a difference block 210 representing a difference between the image data 202 and the prediction block 206. A transform operator 212 transforms the difference block 210, typically a discrete cosine transform (DCT), into a set of transform coefficients 214.
If the input block is rectangular, the set of transform coefficients form a rectangular array. The transform coefficients yk1≦k≦K, are quantized independently by distinct quantizers 216 to generate a set of indices, referred to as the quantized transform coefficients 218.
FIG. 3 shows that the indices are converted by a predetermined scan order 302, typically one that zigzags through the quantized transform coefficients in increasing frequency order, to produce a list of transform coefficients 304. The list of transform coefficients is rewritten as a set of (run, level) pairs 306. The “run” component of each pair is the count of the number of zero coefficients before the next nonzero coefficient; the “level” component is the value of the next nonzero coefficient. The (run, level) pairs are mapped by a codeword mapper 308 into a sequence of bits 310 that are output to the channel to be transmitted to the decoder.
FIG. 4. shows part of an example mapping between (run, level) pairs 402 and codewords 404. One codeword 406 is reserved to indicate that there are no more nonzero coefficients in the block, i.e., to indicate the end-of-block condition 408.
As shown in FIGS. 2 and 3, the basic process for transform coding includes the following steps: converting a block of image data into an array of transform coefficients (214); quantizing the transform coefficients such that all, some, or none of the coefficients become zero; the zero coefficients are typically the high-frequency coefficients (218); ordering the coefficients in a list according to a fixed order, typically in a zigzag scan ranging over the coefficients from low to high frequency in both the horizontal and vertical dimensions, so that the zero (high-frequency) coefficients tend to be clustered at the end of the list (302); coding the list of coefficients as a sequence of (run, level) pairs (306); assigning a codeword to each pair according to a code such as a Huffman code (308); and using a single reserved codeword to signify the “end of block” condition, that is, the condition that all nonzero coefficients in the block have already been coded (406,408).
The run component of each pair is the length of a run of zero coefficients in the coefficient ordering, and the level is the actual value of the next nonzero coefficient. Each possible (run, level) pair is mapped by a fixed, previously determined mapping to a codeword based on a variable length prefix-free code (e.g., a Huffman code). One codeword 406 of the code is reserved for the “end-of-block” indicator 408, meaning that there are no more nonzero coefficients in the block.
There are deficiencies in transform coding. The method requires careful tuning of the coder. The following entities need to be carefully designed and matched to each other: (1) the coefficient ordering; (2) the variable length code; and (3) the matching of (run, level) pairs and the end-of-block condition to codewords. In addition, related coding schemes fail to take advantage of correlations between coefficients other than those implied by the fixed coefficient ordering. Further, the use of prefix-free codes means that some compression inefficiency is inevitable.
Next, this disclosure discusses arithmetic coding with reference to FIG. 5 Arithmetic coding is a method of coding according to which a sequence of events, each with its own probability distribution, is coded, each event using the smallest number of bits theoretically possible given the probability of the event. This number of bits is not restricted to being an integer. An arithmetic coder retains state information between events, and makes use of this state information to allow coding multiple events with a single bit, and to allow the coding for a single event to extend over one or more full or partial bits.
FIG. 5 illustrates an example arithmetic encoder. The encoder contains probability distributions 501, 502, 503, . . . , 504 for all possible events that can occur in different contexts C1, C2, C3, . . . , CN. An event 510 is input to the coder, along with its associated context identifier 520. A selector 530 selects one of the stored probability distributions 532 based on the context identifier. The arithmetic entropy coder 540 transforms the event, the selected probability distribution, and the internal state of the arithmetic coder 550 into a sequence of bits 560 to be output to the channel for transmission to the decoder. The internal state 550 and the selected probability distribution are updated.
A theoretical arithmetic coder uses unlimited precision arithmetic, and is not practical. In the related art there are a number of “approximate arithmetic coders.” These are approximate in the sense that the number of output bits is nearly theoretically optimal, but not exactly so. The result of coding and decoding is a complete and exact reconstruction of the original sequence of events; it is not “approximate” in any sense. The term “arithmetic coding” invariably refers to use of an approximate arithmetic coder.
Many approximate arithmetic coders are designed to code binary events, that is, events that can have one of only two possible values. It is a trivial and obvious use of a binary arithmetic coder to code non-binary events by decomposing the non-binary events into a sequence of binary decisions, each coded as a binary event by a binary arithmetic coder.
What is needed in the art is an improvement image coding and decoding.