The present invention relates to an effective H.264/AVC CAVLC decoding method, and in particular, to an effective H.264/AVC CAVLC decoding method wherein a VLC code is classified into groups according to a correlation thereof, an arithmetic equation is defined for each of the groups, and a decoding is carried out according to the arithmetic equation in order to minimize a memory access by a table look-up and reduce a decoding time and a power consumption.
A H.264/MPEG-4 AVC decoder employs a highly efficient coding scheme including a CAVLC (Context-based Adaptive Variable length Coding) and a CABAC (Context-based Adaptive Binary Arithmetic Coding) as an entropy coding scheme. The CAVLC refers to a scheme wherein a VLC table is adaptively selected using an information of adjacent blocks. Only the CAVLC decoding method that is in direct relation with the present invention will be described in detail. In accordance with the CAVLC scheme, an information required for decoding such a zigzag-scanned quantized DCT (Discrete Cosine Transform) coefficient having a one dimensional arrangement and a coefficient and a length of running zeros. The CAVLC scheme is developed to use numerous statistical characteristics of a 4×4 block. The characteristics are as follows.
1. After a quantization, most of the block generally includes a plurality of zeros. The CAVLC scheme employs a run-level coding in order to simply express the running zeros.
2. After a zigzag scan, first, second and third non-zero coefficients starting from an end of the one dimensional arrangement are likely ±1, and the CAVLC scheme reports a number of the ±1 coefficients (TrailingOnes) via a simple method.
3. A number of non-zero coefficients of the adjacent blocks have correlation with a number of coefficients of a current block. The number of the coefficients is encoded using a look-up table wherein the look-up table is selected according to the number of the non-zero coefficients of the adjacent blocks.
4. A level of the non-zero coefficients tends to be larger at a starting point (around a DC coefficient) of the zigzag-scanned one dimensional arrangement, and tends to get smaller near an end (i.e. closer to a high frequency). The CAVLC scheme utilizes such characteristic to select the VLC look-up table for a level parameter according to a magnitude of a coded level.
The CAVLC encoding of the block progresses according to a plurality of steps. The steps include encoding the number of the coefficients (TotalCoeff) and the number of the ±1 coefficients (TrailingOnes), and encoding a sign of the ±1 coefficients (TrailingOnes). Thereafter, a level of the non-zero coefficients is encoded, and a number of zeros prior to a last coefficient is then encoded. Finally, the length of the running zeros (run) is encoded. In accordance with the H.264/MPEG-4 AVC, the “run” and the “level” are separately encoded in a reverse order of the zigzag scan using the VLC table contrary to MPEG-2 wherein a combination of the “run” and the “level” is encoded.
The decoding process in detail is as follows.
1) Coeff_Token Decoding
A Coeff_Token comprises a combination of a TotalCoeffs and the TrailingOnes. The TotalCoeffs has a value ranging from 0 to 16, and the TrailingOnes has a value ranging from 0 to 3. When three or more TrailingOnes exist, only the last three TrailingOnes correspond, and the rest of the ±1 coefficients are decoded similar to a normal coefficient. Four VLC tables for the Coeff_Token exist and the table is selected adaptively. The VLC table is selected by a mean value nC (=Round[(nA+nB)/2]) of a number of DCT coefficients of a left 4×4 block nA and a number of DCT coefficients of an upper 4×4 block nB. However, when only the upper block is valid, nC=nB, when only the left block is valid, nC=nA, when none of the upper block and the left block is valid, nC=0. Table 1 illustrates a criteria of selecting the VLC table of Coeff_Token.
TABLE 1nCtablemeaning0, 1Num-VLC0the number of the coefficients is small2, 3Num-VLC1the number of the coefficients is normal4, 5, 6, 7Num-VLC2the number of the coefficients is large8 or aboveNum-FLC6-bit FLC (Fixed Length Coding)
Generally, of the four VLC tables shown in FIG. 1, VLC0, VLC1 and VLC2 is decoded by a table look-up. This is because the Num-VLC table is not structured to be expressed as an arithmetic equation. On the other hand, the Num-FLC table does not require the table look-up because the Num-FLC table has the fixed length coding and is expresses as the arithmetic equation.
2) Sign Decoding of the TrailingOnes
Up to the three ±1 coefficients in the one dimensional arrangement of the zigzag-scanned DCT coefficients is coded. + is encoded as a bit 0 and − as a bit 1. That is, a number of sign bits is equal to the number of the TrailingOnes obtained in 1) above.
The decoding is simply completed by reading a maximum of three bits.
3) A Level Decoding of the Non-Zero Coefficients
The level decoding of the non-zero coefficients is decoded in a reverse order and ±1 decoded in the above 2) is excluded. One of the seven VLC tables is adaptively selected according to a level decoded immediately before.
A level VLC table is structurized such that the level is decoded by a simple arithmetic equation rather than the table look-up.
4) Decoding of a Total_Zeros Which is a Number of Zero Coefficients Prior to a Last Non-Zero Coefficients
One of the fifteen VLC tables is selected according to the TotalCoeff value obtained in 1).
Since the selected VLC table is not structurized, the decoding by the table look-up is generally carried out.
5) Decoding of a Run_Before Which is a Number of Zeros Prior to the Each of Non-Zero Coefficients
The run_before is decoded in the reverse order, and one of the seven VLC tables is selected according to a ZerosLeft value which is a number of remaining zeros during the decoding of the total_zeros which is the number of total zeros.
Since the selected VLC table is not structurized, the decoding by the table look-up is generally carried out.
Firstly, the above-described decoding by the basic sequential table look-up scheme may be referred to as a TLSS (Table Look-up by Sequential Search).
Secondly, the table look-up by a binary search rather than a sequential search may be employed, which may be referred to as a TLBS (Table look-up by Binary Search). While the TLBS has 55% less memory search than the TLSS, the TLBS requires a random memory access characteristic. Therefore, the TLBS does not provide any large improvement in a CAVLC decoding operation speed.
In order to solve a problem of a large amount of the memory access by the table look-up, Moon's method has been proposed.
In accordance with Moon's method, the amount of memory access is largely reduced by aiming at a low power decoding in a mobile device. In order to reduce the amount of the memory access, a portion of a VLC code having a high frequency of use is decoded by the arithmetic equation, and a rest of the VLC code having a low frequency of use is decoded by the TLSS.
FIG. 1 is a flow diagram illustrating a structure of a run_before decoding algorithm in accordance with a conventional art.
Moon's method only handles coeff_token and run_before VLC decoding. The decoding algorithm is described below.
1) Decoding of coeff_token-VLC0 table
1.1) m is obtained from an inputted bitstream. m denotes a number of bit 0 until bit 1 appears in the bitstream.
1.2) If m is greater than or equal to 4, the TLSS is used. Otherwise, first two bits I [1:0] are read.
1.3) TotalCoeff and T1s are obtained by equations below.T1s={m+(d−1)*((m+1)/4)*(d/2)} % 4, (where d=3−I[1:0])TotalCoeff={m+d*((m+1)/4)*(d/2)} % 4
% and / denote a remainder operation and an integer division operation with a rounding
2) Decoding of coeff_token-VLC1 table
2.1) Four bits I[3:0] are read from the bitstream.
2.2) If I[3:2] is zero, the TLSS is used. Otherwise 2.3) is carried out.
2.3) T1s=D+(1−w)*(d/2), where D=3−I[3:2], w=I[3:2]/2, and d=3−I[1:0]TotalCoeff=T1s+(1−w)*(d+1)/4;
3) Decoding of coeff_token-VLC0 table
3.1) The four bits I[3:0] are read from the bitstream.
3.2) If I[3] is zero, the TLSS is used. Otherwise 3.3) is carried out.
3.3) T1s=3+(d−3)*w, where d=3−I[1:0] and w=I[3:2] % 2TotalCoeff=d+4*(1−w)
A meaning of 1), 2) and 3) is that the VLC code arranged in an upper portion of the VLC table having the high frequency is decoded using the arithmetic equation as described above to reduce the amount of the memory access.
4) run_before decoding
The zero_left value is initialized to the total_zeros value. Thereafter, the zero_left value is substituted with a zero_left-run_before value for every decoding of the run_before value. That is, the zero_left value is the number of remaining zeros.
1. When zero_left≧7 is satisfied, inputted three bits is stored in I[2:0].
If I[2:0] is larger than 0, run_before=7−I[2:0], otherwise run_before=4+m.
2. When zero_left=6 is satisfied, the inputted three bits is stored in I[2:0].
When I[2:0] is smaller than 2, run_before=I[2:0]+1, when I[2:0] is equal to or larger than 6, run_before=zero_left−I[2:0]/2, when I[2:0] is 2 or 4, run_before=I[2:0]+2, and run_before is I[2:0] otherwise.
3. When zero_left is no less than 3 and no more than 5, the inputted three bits is stored in I[2:0].
If zero_left≦3+I[2:−]/2 is satisfied, run_before=3−I[2:0], otherwise run_before=zero_left−I[2:0].
4. When zero_left is 1 or 2, inputted two bits are stored in I[1:0].
If zero_left is 2, run_before=(2−I[1:0])*(1−I[1]), otherwise run_before=1−I[1].
While the TLSS scheme is easy to implement, an unconditional sequential memory access is required. Therefore, a large amount of power consumption may occur in a mobile environment. Moreover, the CAVLC decoding operation time may be extremely long due to a low speed memory that is mainly used in the mobile environment.
While the TLBS scheme is also easy to implement, and has 55% less memory access than the TLSS, the memory access is still relatively large, and the CAVLC decoding operation time is hardly reduced due to the random memory access characteristic.
While Moon's method reduces the amount of memory access by 65% compared to the TLSS, Moon's method fundamentally has a limitation. That is, the amount of the memory access shows an irregular result according to various sequences and picture quality (due to various quantization parameter (QP)) since the VLC code of the coeff_token having the high frequency of use statistically. Moreover, an advantage of converting to the arithmetic equation is faded due to excessive conditional sentences of the run_before decoding. In addition, similar to the TLBS, Moon's method does not provide a large reduction of the CAVLC decoding operation time even though the CAVLC decoding operation time is reduced.