Subject matter related to the present application can be found in co-pending U.S. patent application Ser. No. 13/529,159, filed Jun. 21, 2012 and entitled “Scalable Video Coding Techniques”, which is incorporated herein in its entirety.
Video coding, as discussed herein, refers to techniques where a series of uncompressed pictures is converted into a compressed video bitstream. Video decoding refers to the inverse process. Many image and video coding standards such as ITU-T Rec. H.264, “Advanced video coding for generic audiovisual services”, March 2010, available from the International Telecommunication Union, Place de Nations, CH-1211 Geneva 20, Switzerland (ITU), and at http://www.itu.int/rec/T-REC-H.264, and incorporated herein by reference in its entirety, or ITU-T Rec. H.265, “High Efficiency Video Coding” (HEVC), April 2013, available from the ITU, and at http://www.itu.int/rec/T-REC-H.265, can specify the bitstream as a series of coded pictures. In such standards, each coded picture can be described as a series of blocks, such as macroblocks in H.264, and largest coding units in HEVC. The standards can further specify the decoder operation on the bitstream.
In video decoding according to H.264, for example, two coding modes can be identified, namely, “inter mode” and “intra mode.” Inter mode can refer to the coding of samples, blocks, or pictures relative to previously coded or decoded pictures or parts thereof, using techniques commonly referred to as “inter picture prediction.” In contrast, intra mode can refer to the coding of samples, blocks, or pictures without inter picture prediction.
While, by definition, there is no inter picture prediction in intra mode, there are mechanisms that can predict between information related to blocks within the same picture or parts thereof, for example, a slice. One of these prediction mechanisms is commonly referred to as “Intra Prediction.” Intra prediction refers to the prediction of sample values, for example, a block of samples currently being decoded, based on sample values belonging to (neighboring) blocks that previously have been decoded for the subject picture. The actual sample values of a block under reconstruction can be created by adding, to the predictor, a residual that is coded in the bitstream.
In H.264, intra prediction can use samples from neighboring blocks following a spatial direction coded in the bitstream. FIG. 1 illustrates eight prediction modes that are used for directional spatial prediction in H.264. A ninth prediction mode (not shown in FIG. 1) is DC prediction. Referring to FIG. 1, the eight directional spatial modes (101) can be identified by arrows pointing in the direction from which the prediction samples are taken, and by numerals indicating the symbol that is being coded in the bitstream to refer to the directional spatial mode (102). To code a block in intra mode, a most probable mode (MPM) is derived based on the prediction modes and availability of previously coded or decoded neighboring blocks. If the MPM is chosen to predict the current block (DC prediction), then this is coded by prev_intra4×4_pred_mode_flag (in the case of a 4×4 block) or prev_intra8×8_pred_mode_flag (in the case of an 8×8 block). Otherwise, one of the eight remaining modes is coded using a syntax element consisting of three bits (rem_intra4×4_pred_mode for a 4×4 block or rem_intra8×8_pred_mode for an 8×8 block).
For spatial intra prediction in HEVC, thirty-five intra prediction modes are specified, of which two are used for DC and planar prediction, and the remaining thirty-three are used for directional spatial prediction. FIG. 2 illustrates thirty-three directional spatial modes (201) as arrows indicating the spatial direction from which the prediction sample is copied, and numerals indicating the symbol used to represent the spatial mode (202). Note that mode 0 is used for planar prediction and mode 1 for DC prediction (not illustrated in FIG. 2). To code each prediction unit (PU) in intra mode, three MPMs are derived based on the prediction modes and availability of previously coded or decoded neighboring blocks. If one of the three MPMs is chosen to predict the current PU (this is indicated by prev_intra_luma_pred_flag), the selected MPM is coded by a syntax element representing an index to one of the three MPMs (mpm_idx=0, 1, or 2), which indicates that the intra spatial directional prediction mode is equal to the selected MPM. Otherwise, the intra spatial directional prediction mode is one of the thirty-two remaining modes, and its value is coded by a syntax element that is a fixed length, five bit field (rem_intra_luma_pred_mode). With the five bits, values between 0 through 31, for a total of 32 values, can be represented.
Excerpts of the HEVC coding_unit syntax and semantics are shown below, which illustrate the described intra prediction direction coding mechanism in the language used by the HEVC standard.
SYNTAX TABLEcoding_unit( x0, y0, log2CbSize ) {Descriptor. . . pbOffset = ( PartMode = = PART_NxN ) ? ( nCbS / 2 ) : nCbS for( j = 0; j < nCbS; j = j + pbOffset )  for( i = 0; i < nCbS; i = i + pbOffset )   prev_intra_luma_pred_flag[ x0 + i ][ y0 + j ]ae(v) for( j = 0; j < nCbS; j = j + pbOffset )  for( i = 0; i < nCbS; i = i + pbOffset )   if( prev_intra_luma_pred_flag[ x0 + i ][ y0 + j ] )    mpm_idx[ x0 + i ][ y0 + j ]ae(v)   Else    rem_intra_luma_pred_mode[ x0 + i ][ y0 + j ]ae(v). . .
Semantics Specification: The syntax elements prev_intra_luma_pred_flag[x0+i][y0+j], mpm_idx[x0+i][y0+j] and rem_infra_luma_pred_mode[x0+i][y0+j] specify the intra prediction mode for luma samples. The array indices x0+i, y0+j specify the location (x0+i, y0+j) of the top-left luma sample of the considered prediction block relative to the top-left luma sample of the picture. When prev_intra_luma_pred_flag[x0+i][y0+j] is equal to 1, the intra prediction mode is inferred from a neighboring intra-predicted prediction unit according to subclause 8.4.2.
Spatial and SNR scalable coding can refer to techniques where a coded picture at a highest resolution/quality is represented by at least two pictures, one coded in a base layer bitstream and the others in at least one enhancement layer bitstream. In SNR scalability, the spatial resolutions of the pictures of the various layers are the same, whereas in spatial scalability, the enhancement layer resolution can be higher than the base layer resolution, requiring, for example, upsample filters or similar techniques to enable the reconstruction of an enhancement layer from an already reconstructed base layer. Except for this difference, spatial and SNR scalability can utilize similar techniques, including intra prediction mechanisms.
H.264 specifies techniques for spatial and SNR scalable coding in its Annex G, also known as Scalable Video Coding or SVC. Annex G specifies many different cross-layer prediction techniques that utilize similarities between the coding decisions of a base layer and an enhancement layer encoder (which can result from the coded material being the same for both encoders). However, SVC does not specify prediction of spatial intra coding modes between layers. Under development is a second version of H.265, referred to as Scalable High Efficiency Video Coding (SHVC). A working draft of SHVC can be found at http://phenix.int-evry.fr/jet/doc_end_user/documents/13_Incheon/wg11/JCTVC-M1008-v1.zip.