High Efficiency Video Coding (HEVC) is a new video coding standard currently being developed in Joint Collaborative Team-Video Coding (JCT-VC). JCT-VC is a collaborative project between MPEG and ITU-T. Currently, a Draft International Standard (DIS) is defined that includes a number of new tools which makes HEVC considerably more efficient than H.264/AVC.
Residual Coding in HEVC
HEVC is a hybrid codec that uses previously transmitted pictures as reference pictures. In inter coding, pixels from these previously transmitted pictures are used to predict the content in the current picture. In intra coding, already transmitted pixels in the same image are used to predict the pixels in the current block. We will describe intra coding in more detail. The left diagram in FIG. 1 shows the 4×4 pixels that are to be coded. Since the blocks are sent in a left-to-right, top-to-bottom order, the pixels above and to the left of the block have already been encoded and transmitted. The surrounding pixels are shown to the right in FIG. 1, where the 4×4 pixels to be coded are illustrated in white.
A prediction direction is also transmitted. As shown in the left diagram in FIG. 2, the chosen prediction direction is diagonal in this particular example. In the right diagram in FIG. 2, the result of the diagonal prediction may be seen. The predicted block to the right in FIG. 2 is not exactly the same as the original block illustrated to the left in FIG. 1, but it is quite close.
FIG. 3 illustrates the prediction error or difference between the original block and the prediction. This may also be denoted the residual or a residual block. In this particular example, the prediction was very accurate near the predicting pixels, and the error there is zero. Note that the error is larger in the bottom right corner. This is typical in intra prediction, since those pixels are furthest from the pixels used for prediction, and prediction gets harder the further away from the known pixels it gets. In inter prediction the prediction error typically do not have any specific part of the residual block with larger error than any other parts of the residual block.
The next step is to transform this residual block. The transform is done with a DCT-like transform, resulting in a block of transform coefficients. An example of a transformed block is shown in the left diagram in FIG. 4. Note that the illustrated values are not mathematically correct, but are used for illustrative purposes. The diagram to the right in FIG. 4 illustrates a diagonal scanning order applied on the transform block.
After the transform, the largest values are typically in the top left corner, which may be seen in FIG. 4. This is typically the case for both inter and intra prediction. The coefficients in the upper left corner represent low frequencies. The reason why these are typically bigger is due to the fact that most images, and most residual images, have low frequency behavior. The transform coefficients are quantized, and the block is scanned. The goal with the scanning is to get as many zeros as possible in consecutive order. As is shown in the right diagram in FIG. 4, selecting a scanning order that scans the pixels diagonally starting with the bottom right pixel will almost sort the coefficients in increasing order and thus create a long run of zeros. Other scan directions used in intra block coefficient coding are horizontal scan and vertical scan, which are illustrated in FIG. 5.
In HEVC, the scan direction is mode dependent for intra for transform block size 4×4 and 8×8. DC and planar prediction and blocks that are predicted in a diagonal direction uses diagonal scan, blocks predicted in a direction more biased to prediction in a vertical direction use horizontal scan and blocks predicted in a direction more biased to prediction in a horizontal direction use vertical scan. Larger transform blocks of intra predicted blocks and all inter predicted transform blocks always use diagonal scan.
The scanned coefficients are then transmitted. To exploit the fact that there typically is a lot of zeros in the beginning, the encoder transmits the position of the last nonzero coefficient with origin at top left corner. The origin is at the DC coefficient position in HEVC and other DCT-like transform approaches, since that will typically be shorter than having the origin at the highest frequency position.
This position of the last nonzero coefficient is the first position in the scan order. The coefficients preceding the indicated position are assumed, by the decoder, to be zero. Next, a bitmask comprising flags for the remaining coefficients is transmitted, which shows whether the coefficients are nonzero or not. This is shown in FIG. 6. The last nonzero coefficient is not part of the bitmask, since it is already indicated that this coefficient is nonzero.
Each bit or flag in the transmitted bitmask is sent using a context, here denoted nonZeroContext, that is determined by the position in the original block. For instance, the last bit in the bit mask (corresponding to the top left pixel in the block) uses context 0. Since this is the position of the lowest frequencies, it is quite likely that the coefficient in this position will be nonzero. Therefore, this context 0 has a quite high probability for being nonzero. The bits in the beginning of the bit mask have a relatively much lower probability for being nonzero. As an example, the first transmitted bit in the bit mask will use context 7, which has a relatively low probability of begin nonzero. The encoder and the decoder keep track of probabilities of whether a coefficient at respective position of the block is nonzero to get an efficient compression by the context adaptive binary arithmetic coding engine (CABAC).
After this, the nonzero coefficients are further described, by stating, in another bitmask comprising flags, whether their absolute value is equal to one or if they are larger than one. The reason for this is that a coefficient of magnitude one (+1 or −1) is by far the most common coefficient, so it is efficient to describe them this way. These bits are also transmitted using a context. Since low frequency coefficients typically have the largest values, and since they are in the end of the scan, it is likely that if a value with a magnitude larger than one has been encountered, the next one may also be large. The context model, denoted here greaterThan1Context, tries to reflect this fact, as is shown in FIG. 7.
The context model, greaterThan1Context, is entered with a start position in context 1. If the absolute value of the first coefficient is one, the position is changed to context 2, which has a larger probability that the absolute value of the next coefficient will be of magnitude one. If another absolute value of magnitude one comes, the position is changed to context 3, which has an even larger probability that the absolute value of the next coefficient will be of magnitude one. However, as soon as the absolute value of a subsequent coefficient is larger than magnitude one, the position is changed to context 0, where this probability is much lower. As illustrated in FIG. 7, the procedure then remains in context 0, since there is no state shift leading out of context 0.
Note that this works well when coefficients are transmitted in increasing order. In the beginning, the absolute value of the nonzero coefficients will mostly be of magnitude one, and the method/procedure will be in context 2 or context 3 where the probability of transmitting/receiving a coefficient with an absolute value of magnitude one is high. When the absolute value of a coefficient is higher than one, this indicates a high probability for being near the end of the block, where the large coefficients cluster. Then, it makes sense to move to context 0, which has a much higher probability for coefficients with an absolute value larger than magnitude one.
A flag for indicating if the absolute value of a coefficient among the first eight (up to eight coefficients in this example) is larger than 2 is also transmitted for the coefficients up to the one that is larger than 2.
Lastly, the first coefficients (four in the example: 2, 3, −3 and 6) that have absolute values that are larger than one are predicted to be 2 or 3, and the remaining coefficients are predicted to be 1 and a remaining coefficient value is transmitted to assign correct magnitude values for them, and the signs for the coefficients are also transmitted. Thus, all the information needed to reconstruct the block has now been transmitted/received.
In the decoder, the operations are performed in inverse order; first decoding the coefficients, then inverse-quantization, possibly inverse transform (if not transform skipped block), and then addition of the intra or inter prediction with the residual.
Transform Skip
The coefficient encoding described above works well for the typical case. However, in HEVC it is also possible to skip the transform altogether. The reason for this is that for some types of content, such as computer graphics comprising much contrast, e.g. subtitels, the transform does more harm than good. In such cases it is therefore possible to signal that the transform should be skipped. This is done using a flag called transform_skip_flag. When this flag is set to one, no transform is performed. In this document we will call (denote) a block for which the transform_skip_flag is one a transform skipped block.
However, when the transform is skipped, the coefficient encoding described above does not work so well anymore. As can be seen in FIG. 8, scanning the residual from FIG. 3 without applying the transform will mean that most of the big values will end up in the beginning of the scanned sequence instead of in the end. This means that the coefficient encoding will not work so well.
For example, the last non-zero coefficient, which is indicated by the encoder, will now very often be the first coefficient in the scan (bottom right corner in the block). Thereby, the tail of zeros will not be as easy to describe as for the transformed blocks.
Furthermore, as can be seen in FIG. 9, when sending the bit mask, the contexts will assume the wrong thing. Context 0, which normally has a high probability of being nonzero, will be used for the last value in the scan. But since this last value is the pixel closest to the pixels we predict from (see the right diagram of FIG. 1), this is the pixel that has the largest probability of being zero (having an error or residual of zero).
This is a problem in two ways. First, the coefficient encoding of transform skipped blocks will be inefficient, since the context probabilities will be all wrong. Second, since the contexts are adaptive, the contexts will be destroyed, in regard of transformed blocks, every time a transform skipped block is encoded. The next block, which may be an ordinary block (i.e. transformed), will be coded with inefficient settings of the probabilities in the different contexts.
Below, table 1 comprises residual coding syntax from a HEVC draft text that shows the syntax elements of residual coding in HEVC. Those elements are part of the bitstream that is supposed to be understood by a HEVC decoder.
TABLE 1residual_coding( x0, y0, log2TrafoSize, cIdx ) {Descriptor if( transform_skip_enabledflag && !cu_transquant_bypass_flag && ( log2TrafoSize = = 2) )  transform_skip_flag[ x0 ][ y0 ][ cIdx ]ae(v) last_significant_coeff_x_prefixae(v) last_significant_coeff_y_prefixae(v) if( last_significant_coeff_x_prefix > 3 )  Last_significant_coeff_x_suffixae(v) if( last_significant_coeff_y_prefix > 3 )  Last_significant_coeff_y_suffixae(v) lastScanPos = 16 lastSubBlock = ( 1 << ( log2TrafoSize − 2) * ( 1 << ( log2TrafoSize − 2 ) ) − 1 Do {  if( lastScanPos = = 0) {   lastScanPos = 16   lastSubBlock− −  }  lastScanPos− −  xS = ScanOrder[ log2TrafoSize − 2 ][ scanIdx ][ lastSubBlock ][ 0 ]  yS = ScanOrder[ log2TrafoSize − 2 ][ scanIdx ][ lastSubBlock ][ 1 ]  xC = ( xS << 2) + ScanOrder[ 2 ][ scanIdx ][ lastScanPos ][ 0 ]  yC = ( yS << 2) + ScanOrder[ 2 ][ scanIdx ][ lastScanPos ][ 1 ] } while( ( xC != LastSignificantCoeffX )||( yC != LastSignificantCoeffY ) ) For( i = lastSubBlock; i >= 0; i− − ) {  xS = ScanOrder[ log2TrafoSize − 2 ][ scanIdx ][ i ][ 0 ]  yS = ScanOrder[ log2TrafoSize − 2 ][ scanIdx ][ i ][ 1 ]  inferSigCoeffFlag = 0  if( ( i < lastSubBlock ) && ( i > 0 ) ) {   coded_sub_block_flag[ xS ][ yS ]ae(v)   inferSigCoeffFlag = 1  }  for( n = ( i = = lastSubBlock ) ? lastScanPos − 1 : 15; n >= 0; n− − ) {   xC = ( xS <<2 ) + ScanOrder[ 2 ][ scanIdx ][ n ][ 0 ]   yC = ( yS <<2 ) + ScanOrder[ 2 ][ scanIdx ][ n ][ 1 ]   If( coded_sub_block_flag[ xS ][ yS ] && ( n > 0 || !inferSigCoeffFlag ) ) {    significant_coeff_flag[ xC ][ yC ]ae(v)    if( significant_coeff_flag[ xC ][ yC ])     inferSigCoeffFlag = 0   }  }  firstSigScanPos = 16  lastSigScanPos = −1  numGreater1Flag = 0  firstGreater1ScanPos = −1  for( n = 15; n >= 0; n− − ) {   xC = ( xS << 2 ) + ScanOrder[ 2 ][ scanIdx ][ n ][ 0 ]   yC = ( yS << 2) + ScanOrder[ 2 ][ scanIdx ][ n ][ 1 ]   If( significant_coeff_flag[ xC ][ yC ] ) {    if( numGreater1Flag < 8) {     coeff_abs_level_greater1_flag[ n ]ae(v)     numGreater1Flag++     if( coeff_abs_level_greaterl_flag[ n ] && firstGreater1ScanPos = = −1 )      firstGreater1 ScanPos = n    }    if( lastSigScanPos = = −1)     lastSigScanPos = n    firstSigScanPos = n   }  }  signHidden = ( lastSigScanPos − firstSigScanPos > 3 && !cu_transquant_bypass_flag )  if( firstGreater1ScanPos != −1 )   Coeff_abs_level_greater2_flag[ firstGreater1ScanPos ]ae(v)  for( n = 15; n >= 0; n− − ) {   xC = ( xS << 2 ) + ScanOrder[ 2 ][ scanIdx ][ n ][ 0 ]   yC = ( yS << 2 ) + ScanOrder[ 2 ][ scanIdx ][ n ][ 1 ]   If( significant_coeff_flag[ xC ][ yC ] &&    ( !sign_data_hiding_flag || !signHidden||n != firstSigScanPos ) )    coeff_sign_flag[ n ]ae(v)  }  numSigCoeff = 0  sumAbsLevel = 0  for( n = 15; n >= 0; n− − ) {   xC = ( xS << 2 ) + ScanOrder[ 2 ][ scanIdx ][ n ][ 0 ]   yC = ( yS << 2 ) + ScanOrder[ 2 ][ scanIdx ][ n ][ 1 ]   if( significant_coeff_flag[ xC ][ yC ]) {    baseLevel = 1 + coeff_abs_level_greater1_flag[ n ] + coeff_abs_level_greater2_flag[ n ]    if( baseLevel = = ( ( numSigCoeff < 8) ? ( (n = = firstGreater1ScanPos) ? 3 : 2) : 1 ) )     coeff_abs_level_remaining[ n ]ae(v)    TransCoeffLevel[ x0 ][ y0 ][ cIdx ][ xC ][ yC ]=     ( coeff_abs_level_remaining[ n ]+baseLevel ) * ( 1 − 2 * coeff_sign_flag[ n ] )    if( sign_data_hiding_flag && signHidden ) {     sumAbsLevel += ( coeff_abs_level_remaining[ n ] + baseLevel )     if( n = = firstSigScanPos && ( ( sumAbsLevel % 2) = = 1) )      TransCoeffLevel[x0][y0][cIdx][xC][yC]= − TransCoeffLevel[x0][y0][cIdx][xC][yC]    }    numSigCoeff++   }  } }}
A solution to the problem described above has been proposed in JCTVC (Joint Collaborative Team on Video Coding). The proposed solution was to reverse the scan order when the block is a transform skipped block.
However, applying this suggested solution would mean that new, reversed scan orders must be implemented. Since implementation of new scan orders can be expensive in hardware, the JCTVC group decided not to recommend adoption of that proposal.