The H.264/AVC standard provides excellent coding efficiency but it does not consider scalable video coding (SVC). SVC may provide different layers, usually a base layer (BL) and an enhancement layer (EL). To give more functionality for the video codec, the Motion Picture Expert Group (MPEG) considered providing a standard for SVC. Various techniques were proposed, and the Joint Video Team (JVT) finally started a standard called JSVC, with a corresponding reference software description called JSVM. SVC provides temporal, SNR and spatial scalability for applications. The base layer of JSVM is compatible with H.264, and most components of H.264 are used in JSVM as specified, so that only few components need to be adjusted according to the subband structure. Among all the scalabilities, spatial scalability is the most challenging and interesting topic, since it is hard to use the redundancy between the two spatial scalable layers.
SVC provides several techniques for spatial scalability, such as IntraBL mode, residual prediction or BLSkip (base layer skip) mode. These modes can be selected on macroblock (MB) level.
IntraBL mode uses the upsampled reconstructed BL picture to predict a MB in the EL, and only encodes the residual. Residual prediction tries to reduce the energy of the motion compensation (MC) residual of the EL by subtracting the upsampled MC residual of the BL.
BLSkip mode utilizes the upsampled MV for a MB in the EL and requires only the residual to be written into the bit stream if a MB selects this mode. Thus, the BLSkip mode makes use of the redundancy between the MVs of a BL and its EL in the spatial scalability case.
In the JSVM of SVC, BLSkip modes are used for MBs of inter coded predicted (P) frames and inter coded bi-predicted (B) frame. A BL MV, which will be usually stored for each 4×4 block, will be upsampled by multiplication with two. Then the upsampled MV will correspond to an 8×8 block of the higher resolution EL. That is, if the QCIF (176×144) BL frame has (11×9) MBs and each MB has sixteen 4×4 blocks, there are 11×9×16 MVs in the BL (if there is no intra MB). When a selected MV is (h,v) and its corresponding 4×4 block has the start coordinates (x,y), then the upsampled MV is (h*2,v*2) and the corresponding 8×8 block in the high resolution frame (CIF: 352×288) is (2*x,2*y). Thus, four 4×4 blocks with start coordinates of (2x,2y), (2x+4, 2y), (2x,2y+4) and (2x+4,2y+4) will be assigned the same MV of (2h,2v).
Then during the mode decision process, when BLSkip is the current candidate, the MB (if it has the start coordinate of 2x,2y) will set the MVs by using actually four MVs. For the four 8×8 subblocks, four MVs are set for the current MB, which have the corresponding start coordinates of (2x,2y), (2x+8,2y), (2x,2y+8) and (2x+8,2y+8).