The present invention is generally related to the field of video coding and compression and, more particularly, to a method and system for context-based adaptive variable length coding.
A typical video encoder partitions each frame of the original video sequence into contiguous rectangular regions called xe2x80x9cblocksxe2x80x9d. These blocks are encoded in xe2x80x9cintra modexe2x80x9d (I-mode), or in xe2x80x9cinter modexe2x80x9d (P-mode). For P-mode, the encoder first searches for a block similar to the one being encoded in a previously transmitted xe2x80x9creference framexe2x80x9d, denoted by Fref. Searches are generally restricted to being no more than a certain spatial displacement from the block to be encoded. When the best match, or xe2x80x9cpredictionxe2x80x9d, has been identified, it is expressed in the form of a two-dimensional (2D) motion vector (xcex94x, xcex94y) where xcex94x is the horizontal and xcex94y is the vertical displacement. The motion vectors together with the reference frame are used to construct a predicted block Fpred:
Fpred(x,y)=Fref(x+xcex94x, y+xcex94y) 
The location of a pixel within the frame is denoted by (x, y).
For blocks encoded in I-mode, the predicted block is formed using spatial prediction from previously encoded neighboring blocks within the same frame. For both I-mode and P-mode, the prediction error, i.e. the difference between the block being encoded and the predicted block, is represented as a set of weighted basis functions of some discrete transform. Transforms are typically performed on an 8xc3x978 or 4xc3x974 block basis. The weightsxe2x80x94transform coefficientsxe2x80x94are subsequently quantized. Quantization introduces loss of information, thus quantized coefficients have lower precision than the original ones.
Quantized transform coefficients and motion vectors are examples of xe2x80x9csyntax elementsxe2x80x9d. These, plus some control information, form a complete coded representation of the video sequence. Prior to transmission from the encoder to the decoder, all syntax elements are entropy coded, thereby further reducing the number of bits needed for their representation. Entropy coding is a lossless operation aimed at minimizing the number of bits required to represent transmitted or stored symbols (in our case syntax elements) by utilizing properties of their distribution (some symbols occur more frequently than others).
One method of entropy coding employed by video coders is Variable Length Codes (VLC). A VLC codeword, which is a sequence of bits (0""s and 1""s), is assigned to each symbol. The VLC is constructed so that the codeword lengths correspond to how frequently the symbol represented by the codeword occurs, e.g. more frequently occurring symbols are represented by shorter VLC codewords. Moreover, the VLC must be constructed so that the codewords are uniquely decodable, i.e., if the decoder receives a valid sequence of bits of a finite length, there must be only one possible sequence of input symbols that, when encoded, would have produced the received sequence of bits.
To correctly decode the bitstream, both encoder and decoder have to use the same set of VLC codewords and the same assignment of symbols to them. As discussed earlier, to maximize the compression, the most frequently occurring symbols should be assigned the shortest VLC codewords. However, the frequency (probability) of different symbols is dependant upon the actual frame being encoded. In the case where a single set of VLC codewords, and a constant assignment of symbols to those codewords is used, it is likely that the probability distribution of symbols within a given frame will differ from the probabilities assumed by the VLC, even though the average symbol probability across the entire sequence may not. Consequently, using a single set of VLC codewords and a single assignment of symbols to those codewords reduces coding efficiency.
To rectify this problem different methods of adaptation are used. One approach, which offers reasonable computational complexity, and a good compression versus efficiency trade-off, and which is currently used in the state-of-the art video coders, is now described. For a set of symbols, a number of tables specifying VLC codewords (VLCs) are provided for the encoder and the decoder to use. The table selected to encode a particular symbol then depends on the information known both to the encoder and decoder, such as the type of the coded block (I- or P-type block), the component (luma or chroma) being coded, or the quantization parameter (QP) value. The performance depends on how well the parameters used to switch between the VLCs characterize the symbol statistics.
In the decoder, the block in the current frame is obtained by first constructing its prediction in the same manner as in the encoder, and by adding to the prediction the compressed prediction error. The compressed prediction error is found by weighting the transform basis functions using the quantized coefficients. The difference between the reconstructed frame and the original frame is called reconstruction error.
The compression ratio, i.e. the ratio of the number of bits used to represent original sequence and the compressed one, may be controlled by adjusting the value of the quantization parameter (QP) used when quantizing transform coefficients. The compression ratio also depends on the method of entropy coding employed.
Coefficients in a given block are ordered (scanned) using zigzag scanning, resulting in a one-dimensional ordered coefficient vector. An exemplary zigzag scan for a 4xc3x974 block is shown in FIG. 1.
Zigzag scanning presumes that, after applying 2 dimensional (2D) transform, the transform coefficients having most energy (i.e. higher value coefficients) correspond to low frequency transform functions and are located toward the top-left of the block as it is depicted in FIG. 1. Thus, in a coefficient vector produced through zigzag scanning, the higher magnitude coefficients are most likely to appear toward the start of the vector. After quantization most of the low energy coefficients become equal to 0.
The vector of coefficients can be further processed so that each nonzero coefficient is represented by 2 values: a run (the number of consecutive zero coefficients proceeding a nonzero value in the vector), and a level (the coefficient""s value).
CAVLC (Context-based Adaptive VLC) is the method of coding transform coefficients used in the JVT coder xe2x80x9cJoint Final Committee Draft (JFCD) of Joint Video Specification (ITU-T Rec. H.264 | ISO/IEC 14496-10 AVCxe2x80x9d. In summary, encoding a single 4xc3x974 block using CAVLC involves five steps:
1. Encoding the total number of nonzero coefficients in the block, combined with the number of xe2x80x9ctrailing onesxe2x80x9d.
The number of trailing ones is defined as the number of coefficients with a magnitude of one that are encountered before a coefficient with magnitude greater than one is encountered when the coefficient vector is read in reverse order (i.e. 15, 14, 13, 12, 11, . . . in FIG. 1). The VLC used to code this information is based upon a predicted number of nonzero coefficients, where the prediction is based on the number of nonzero coefficients in previously encoded neighboring blocks (upper and left blocks).
2. Encoding the sign of any trailing ones.
3. Encoding the levels (magnitudes) of nonzero coefficients other than the trailing ones.
4. Encoding the number of zero values in the coefficient vector before the last nonzero coefficient, i.e. the sum of all the xe2x80x9crunsxe2x80x9d. The VLC used when coding this value depends upon the total number of nonzero coefficients in the block, since there is some relationship between these two values.
5. Encoding the run that occurs before each nonzero coefficient, starting from the last nonzero value in the coefficient vector.
The VLC used to encode a run value is selected based upon the sum of the runs from step (4), and the sum of the runs coded so far. For example, if a block has a xe2x80x9csum of runsxe2x80x9d of 8, and the first run encoded is 6, then all remaining runs must be 0, 1, or 2. Because the possible run length becomes progressively shorter, more efficient VLC codes are selected to minimize the number of bits required to represent the run.
A typical block-based video encoder is shown in FIG. 2. As shown in FIG. 1, the video server 100 comprises a front-end unit 10, which receives video signals 110 from a video source, and a video multiplex coder 40. Each frame of uncompressed video provided from the video source to the input 110 is received and processed macroblock-by-macroblock in a raster-scan order. The front-end unit 10 comprises a coding control manager 12 to switch between the I-mode and P-mode and to perform timing coordination with the multiplex coder 40 via control signals 120, a DCT (Discrete Cosine Transform) transformation module 16 and a quantizer 14 to provide quantized DCT coefficients. The quantized DCT coefficients 122 are conveyed to the multiplex coder 40. The front-end unit 10 also comprises an inverse quantizer 18 and an inverse transformation unit 20 to perform an inverse block-based discrete cosine transform (IDCT), and a motion compensation prediction and estimation module 22 to reduce the temporal redundancy in video sequences and to provide a prediction error frame for error prediction and compensation purposes. The motion estimation module 22 also provides a motion vector 124 for each macroblock to the multiplex coder 40. The multiplex coder 40 typically comprises a scanning module 42 to perform the zigzag scan for forming an order vector for each block of image data, an entropy coding module to designate non-zero quantized DCT coefficients with run and level parameters. The run and level values are further mapped to a sequence of bins, each of which is assigned to a so-called xe2x80x98contextxe2x80x99 by a context assignment module 46. The contexts, along with the motion vector, is formatted into a bitstream 140. A context-based encoder is known in the art. Furthermore, it is possible that the transformation module 16 is a FFT (Fast Fourier Transform) module or DFT (Discrete Fourier Transform) module, and that DCT can be an approximation of a DCT.
A typical decoder is shown in FIG. 3. As shown, a client 200 comprises a video multiplex decoder 60, which receives the encoded video bitstream 140 from the encoder 40. The decoder 60 also decodes an I-mode frame on a macroblock-by-macroblock basis. Based on the VLC codewords contained in the bitstream 140, a coefficient extractor module 62 in the decoder 60 recovers the run and level values, and then reconstructs an array of quantized DCT coefficients 162 for each block of the macroblock. The encoded motion vector information associated with the macroblock is extracted from the encoded video bitstream 140. The extracted motion vector 166, along with the reconstructed quantized DCT coefficients 162, is provided to a back-end unit 80. An inverse quantizer 84 inverse quantizes the quantized DCT coefficients 162 representing the prediction error information for each block of the macroblock provides the results to an inverse transformer 86. With the control information provided by a coding control manager 82, an array of reconstructed prediction error values for each block of the macroblock is yielded in order to produce video signals 180.
Currently, video and still images are typically coded with help of a block-wise transformation to frequency domain. Such coding method is used in H.26L (or H.264-to-be) standard by the Joint Video Team (JVT). In such a method, the image is first subdivided into blocks of 4xc3x974 pixels in size and the blocks are transformed into a 4xc3x974 matrix of transform coefficients. The coefficients are then arranged by scanning them along a zigzag path, wherein the low-frequency coefficients are placed first in the scan in order to form an ordered sequence of transform coefficientsxe2x80x94a one-dimensional vector. A 4xc3x974 transform coefficient matrix of FIG. 1 will result in a one-dimension array or a sequence of 1, 2, 5, 9, 6, 3, 4, 7, 10, 13, 14, 11, 8, 12, 15, 16. This is advantageous because the following step is to code the quantized values of the DCT coefficients by run-length coding, whereby the more probable runs are represented by short codes (Huffman coding or arithmetic coding). Arranged in such a manner, many of the coefficients at the end of the scan usually end up being zero. Thus the coefficients are coded with high-efficiency. It is known that variable-length coding means that not all symbols have the same length (in bits). Huffman coding is an example of variable-length coding. Arithmetic is slightly different in that it involves a series of symbols. Thus, it is in general not possible to describe the length of ONE symbol as requiring X bits. Rather, a specific series of symbols will require Y bits. For this reason xe2x80x9centropy codingxe2x80x9d is perhaps a more general term than xe2x80x9cvariable-length codingxe2x80x9d.
The above-described coding scheme is used for producing a block transform of 4xc3x974 pixels. However, Context-based Adaptive VLC (CAVLC) may involve in partitioning the transform coefficients into blocks that are larger than 4xc3x974. For example, the JVT coder contains a feature called xe2x80x9cAdaptive Block Transformsxe2x80x9d (ABT) which performs transforms on 4xc3x978, 8xc3x974, and 8xc3x978 blocks. Thus, the coding scheme designed for 4xc3x974 blocks can no longer be applied. A solution to the problem is to split the larger block into sub-blocks of size 4xc3x974.
An existing solution has been proposed, wherein the ABT block of coefficients is divided into 4xc3x974 blocks in the spatial domain. As an example, an 8xc3x978 block is shown in FIG. 4 with one of the scan orders used for this block in the JVT coder. The same block partitioned into four 4xc3x974 blocks is shown in FIGS. 5a to 5c. Subsequently each 4xc3x974 block is zigzag scanned using 4xc3x974 scan, yielding a plurality of vectors of length 16. These length 16 vectors are then passed to the standard 4xc3x974 CAVLC algorithm. When 4xc3x974 scan shown in FIG. 1 is used for the 4xc3x974 blocks in FIGS. 5a to 5c, the resulting vectors are as given in FIGS. 6a to 6c. 
This existing CAVLC algorithm makes certain assumptions about the content of a coefficient vector. When these assumptions are violated, the coding tables (i.e. the tables specifying which codeword is used to describe which symbol) used by CAVLC are xe2x80x9cmismatchedxe2x80x9d. This means that the length of codewords in the table no longer accurately reflects the probability of a symbol, and consequently CAVLC is less efficient.
As a result of this existing approach, each of the 4xc3x974 blocks created after partitioning of the ABT block has coefficients corresponding to different frequencies in the ABT transform. For example, the 4xc3x974 block of FIG. 5a contains low frequency information (both horizontally and vertically) and therefore most of the high amplitude coefficients. Likewise, the 4xc3x974 block of FIG. 5d contains high frequency information and low amplitude coefficients. The CAVLC algorithm assumes that higher magnitudes generally occur toward the start of the vector, and critically, it assumes that longer runs of zeros will generally occur toward the end of a vector. The 4xc3x974 block of FIG. 5d is statistically unlikely to contain many values in the 4xc3x974 block of FIG. 5a, and the xe2x80x9coutlyingxe2x80x9d values are likely to have long runs of zeros associated with them. Although the 4xc3x974 block of FIG. 5d may contain one or two nonzero coefficients, the locations of those coefficients are mismatched with what CAVLC expects, and consequently coding of that block requires a disproportionately large number of bits.
The CAVLC method also assumes that the neighboring blocks have similar number of nonzero coefficients. For the blocks, which have coefficients corresponding to different frequencies of transform functions the number of nonzero coefficients vary drastically. That can lead to the wrong choice of the VLC table used to code the number of the nonzero coefficient of a given block since this choice is based on the number of the nonzero coefficients of its neighbors.
Thus, the existing block partitioning scheme is not an optimal solution in terms of coding efficiency and quantization accuracy.
It is advantageous and desirable to provide a more efficient method and system for video and image coding, which can be applied to ABT blocks having a general size of (4 n)xc3x97(4 m) where n and m are positive integers equal to or greater than 1.
It is a primary objective of the present invention to reduce the number of bits required to represent the quantized coefficients that result after application of a block transform larger than 4xc3x974. More precisely, it is aimed at reducing the number of bits required to represent coefficients resulting from a 4xc3x978, 8xc3x974, or 8xc3x978 transform. Moreover, in order to simplify design of the JVT encoder as well as to minimize the memory required by the code implementing JVT, it is desirable that the CAVLC method developed for 4xc3x974 block is used to code 4xc3x978, 8xc3x974, or 8xc3x978 blocks unchanged or with minimal modifications.
The objective can be achieved by partitioning a block larger than 4xc3x974 by a plurality of sub-block of size 4xc3x974 using the original vector in an interleaved fashion.
Thus, according to the first aspect of the present invention, a method of image coding characterized by
forming at least a block of transform coefficients from the image data, by
scanning the block of transform coefficients for providing a sequence of transform coefficients, by
sub-sampling the transform coefficients in the sequence in an interleaved manner for providing a plurality of sub-sampled sequences of transform coefficients, and by
coding the sub-sampled sequences of transform coefficients using an entropy encoder.
Advantageously, said sub-sampling is carried out prior to or after said coding.
Preferably, the sequence of the transform coefficients has a length of 16 nxc3x97m, where n and m are positive integer equal to or greater than 1, and each of said sub-sampled sequence of the transform coefficients has a length of 16.
According to the second aspect of the present invention, there is provided a computer program to be used in image coding, wherein the coding process comprises the steps of:
forming at least a block of transform coefficients from the image data, and
scanning the block of transform coefficients for providing a sequence of transform coefficients. The computer program is characterized by
an algorithm for sub-sampling the transform coefficients in the sequence in an interleaved manner for providing a plurality of sub-sampled sequences of transform coefficients.
Advantageously, the coding process further comprises the step of coding the sub-sampled sequences of transform coefficients using an entropy encoder.
Alternatively, the coding process further comprises the step of coding the sequence of transform coefficients using an entropy encoder prior to said sub-sampling.
According to the third aspect of the present invention, there is provided an image encoder for receiving image data and providing a bitstream indicative of the image data. The image encoder is characterized by:
means for forming at least a block of transform coefficients from the image data, by
means for scanning the block of transform coefficients for forming an ordered sequence of transform coefficients from the block, by
a software program for sub-sampling the ordered sequence of transform coefficients in order to form a plurality of sub-sampled sequences of transform coefficients, by
means for entropy coding the sub-sampled sequences of transform coefficients for provided signals indicative of the encoded transform coefficients, and by
means, for providing the bitstream based on the signals.
According to the fourth aspect of the present invention, there is provided an image coding system comprising a server for providing a bitstream indicative of image data and a client for reconstructing the image data based on the bitstream, wherein the server characterized by
a receiver for receiving signals indicative of the image data, by
means for forming at least a block of transform coefficients from the signals, by
means for scanning the block of transform coefficients for forming an ordered sequence of transform coefficients from the block, by
a software program for sub-sampling the ordered sequence of transform coefficients in order to form a plurality of sub-sampled sequences of transform coefficients, by
means for entropy coding the sub-sampled sequences of transform coefficients for provided further signals indicative of the encoded transform coefficients, and by
means, for providing the bitstream based on the further signals.