This section is intended to provide a background or context to the invention that is recited in the claims. The description herein may include concepts that could be pursued, but are not necessarily ones that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, what is described in this section is not prior art to the description and claims in this application and is not admitted to be prior art by inclusion in this section.
Different standards have been specified for different technologies. Video coding standards include the ITU-T H.261, ISO/IEC MPEG-1 Visual, ITU-T H.262 or ISO/IEC MPEG-2 Visual, ITU-T H.263, ISO/IEC MPEG-4 Visual and ITU-T H.264 (also known as ISO/IEC MPEG-4 Advanced Video Coding (AVC) or, in short, H.264/AVC). In addition, there are currently efforts underway to develop new video coding standards. One such standard under development is the scalable video coding (SVC) standard, which will become the scalable extension to the H.264/AVC standard. The latest draft of the SVC is Annex F (now Annex G) of the H.264/Advanced Video Coding (AVC) standard. In particular, Annex F includes a feature known as extended spatial scalability (ESS), which provides for the encoding and decoding of signals in situations where the edge alignment of a base layer macroblock (MB) and an enhancement layer macroblock is not maintained. When spatial scaling is performed with a ratio of 1 or 2 and a macroblock edge is aligned across different layers, it is considered to be a special case of spatial scalability.
For example, when utilizing dyadic resolution scaling (i.e., scaling resolution by a power of 2), the edge alignment of macroblocks can be maintained. This phenomenon is illustrated in FIG. 1, where a half-resolution frame on the left (the base layer frame 1000) is upsampled to give a full resolution version of the frame on the right (an enhancement layer frame 1100). Considering the macroblock MB0 in the base layer frame 1000, the boundary of this macroblock after upsampling is shown as the outer boundary in the enhancement layer frame 1100. In this situation, it is noted that the upsampled macroblock encompasses exactly four full-resolution macroblocks—MB1, MB2, MB3 and MB4—at the enhancement layer. The edges of the four enhancement layer macroblocks MB1, MB2, MB3 and MB4 exactly correspond to the upsampled boundary of the macroblock MB0. Importantly, the identified base layer macroblock macroblock is the only base layer macroblock covering each of the enhancement layer macroblocks MB1, MB2, MB3 and MB4. In other words, no other base layer macroblock is needed to form a prediction for MB1, MB2, MB3 and MB4.
In the case of non-dyadic scalability, on the other hand, the situation is quite different. This is illustrated in FIG. 2 for a scaling factor of 1.5. In this case, the base layer macroblocks MB10 and MB20 in the base layer frame 2000 are upsampled from 16×16 to 24×24 in the higher resolution enhancement layer frame 2100. However, considering the enhancement layer macroblock MB30, it is clearly observable that this macroblock is covered by two different up-sampled macroblocks—MB10 and MB20. Thus, two base-layer macroblocks, MB10 and MB20, are required in order to form a prediction for the enhancement layer macroblock MB30. In fact, depending upon the scaling factor that is used, a single enhancement layer macroblock may be covered by up to four base layer macroblocks.
In the current draft of Annex F of the H.264/AVC standard, it is possible for an enhancement layer macroblock to be coded relative to an associated base layer frame, even though several base layer macroblocks may be needed to form the prediction. Because coding efficiency is closely related to prediction accuracy, it is desirable to form an accurate prediction of the enhancement layer macroblock to improve coding efficiency.
According to the current draft of Annex F of the H.264/AVC standard, a number of aspects of a current enhancement layer macroblock can be predicted from its corresponding base layer macroblocks. For example, intra-coded macroblocks (also referred to as intra-macroblocks or intra-MBs) from the base layer are fully decoded and reconstructed so that they may be upsampled and used to directly predict the luminance and chrominance pixel values at a corresponding enhancement layer. Additionally, inter-coded macroblocks (also referred to as inter-macroblocks or inter-MBs) from the base layer are not fully reconstructed. Instead, only a prediction residual of each base layer inter-MB is decoded and may be used to predict an enhancement layer prediction residual, but no motion compensation is performed on the base layer inter-MB. This is referred as “residual prediction”. Furthermore, for inter-MBs, base layer motion vectors are also upsampled and used to predict enhancement layer motion vectors.
In addition to the above, in Annex F of the H.264/AVC standard, a flag named base_mode_flag is defined for each enhancement layer macroblock. When this flag is equal to 1, then the type, mode and motion vectors of the enhancement layer macroblock should be fully-predicted (or inferred) from its base layer MB(s). Because the same method for deriving macroblock type, mode and motion vectors of an enhancement layer macroblock from base layer MB(s) is known to both the encoder and the decoder, it is unnecessary to further code the macroblock type and mode, as well as its motion vector information into bitstream in this case. If the base_mode_flag is equal to 0, then the macroblock type and mode information of an enhancement layer macroblock is not inferred.
As discussed above, the macroblock type and mode information of an enhancement layer macroblock can be fully predicted from its base layer MB(s) in certain situations. According to the current draft of Annex F of the H.264/AVC standard, when enhancement layer macroblocks are not edge-aligned with base layer macroblocks, for each enhancement layer macroblock, a virtual base layer macroblock is derived based on the base layer macroblocks that cover the enhancement layer macroblock. The type, mode and motion vectors of the virtual base layer macroblock are all determined based on the base layer MB(s). The virtual base layer macroblock will then be considered as the only macroblock from base layer that exactly covers this enhancement layer macroblock. If the base_mode_flag is equal to 1 for the current enhancement layer macroblock, then its type, mode and motion vectors are set as the same as those of the virtual base layer macroblock.
The method defined in the current draft of Annex F of the H.264/AVC standard for determining the type, mode and motion vectors of the virtual base layer macroblock is a bottom-up process. First, for each 4×4 block of the virtual base layer macroblock, one pixel located in the second row and second column in the block is used as a representative point for the block, which is shown in FIG. 3. In FIG. 3, the macroblock is represented at 300. The 4×4 blocks inside the macroblock are represented at 310, and the representative pixel within each 4×4 block is represented at 320. The use of one pixel in each 4×4 block of the virtual base layer macroblock has the advantage of simplicity when the current 4×4 block in virtual base layer macroblock is covered by only one 4×4 block from base layer. But when it is covered by multiple 4×4 blocks from base layer, such a method may not be accurate.
FIGS. 4(a) and 4(b) show the relationship between the virtual base layer macroblock 300 and corresponding base layer macroblock(s). The area in the base layer that, after upsampling, would exactly cover the current enhancement layer macroblock is represented at 410 in FIG. 4(b). This is also the area that corresponds to the virtual base layer macroblock 300. A representative pixel in a 4×4 block in the virtual base layer macroblock 300 is labeled as pe. Its corresponding pixel at the base layer is pb. According to the current draft of Annex F of the H.264/AVC standard, the macroblock partition information of the 4×4 block at the base layer, denoted as 420 in FIG. 4(b), in which pb is located is used as the partition information for the 4×4 block at enhancement layer in which pe is located. In other words, the partition information of the 4×4 block at the base layer that covers the pixel pe is used as the partition information for the 4×4 block in which pe is located. In this way, each 4×4 block in the virtual base layer macroblock 300 can have partition information. Motion vectors associated with the partition information are also used as predictors for enhancement layer motion vectors.
Within each of the four 8×8 blocks in the virtual base layer macroblock, a block merging process is activated at the 4×4 block level. As shown in FIG. 5, if block 1, 2, 3 and 4 all derive their partition from the same single partition from the base layer, then the mode of the 8×8 block is set as 8×8. Otherwise, if block 1 and block 2 derive their partition from a same one partition from the base layer, and block 3 and block 4 also derive their partition from another same one partition from the base layer, then the mode of the 8×8 block is determined as 8×4. Similarly, if block 1 and block 3 have the same partition, and block 2 and block 4 also have the same partition from the base layer, then the mode of the 8×8 block is determined as 4×8. Otherwise, the mode of the 8×8 block is determined as 4×4. This process is repeated separately inside all of the other three 8×8 blocks.
If all four 8×8 blocks are in 8×8 mode, a block merging process is also performed at the 8×8 block level as shown in FIG. 6. In FIG. 6, blocks 1, 2, 3 and 4 all represent an 8×8 block. If block 1, 2, 3 and 4 all derive their partition from the same single partition from the base layer, then the mode of the virtual base layer macroblock is determined to be 16×16. If block 1 and block 2 have the same partition, and block 3 and block 4 also have the same partition from the base layer, then the mode of the virtual base layer macroblock is determined as 16×8. If block 1 and block 3 have the same partition, and block 2 and block 4 also have the same partition, then the mode of the virtual base layer macroblock is set as 8×16. Otherwise, the mode of virtual base layer macroblock is set as 8×8.
According to the current draft of Annex F of the H.264/AVC standard, the predicting of macroblock mode is solely based on the partition information from the base layer. In this arrangement, blocks can only be merged when the blocks share the same partition information from the base layer. However, in the case of extended spatial scalability, it is quite common for different partitions from the base layer to have the same reference frame index and motion vectors. For example, two neighboring macroblocks from the base layer can have the same reference frame index and motion vectors. Additionally, in the case of extended spatial scalability, it is very common for an enhancement layer macroblock to be covered by multiple macroblocks from the base layer. Therefore the use of only partition information in determining if two blocks should be merged or not often unnecessarily creates small partitions inside of a macroblock. Such small partitions increase computation complexity during sample interpolation processes in motion compensation.
In light of the above, it would be desirable to provide a system for improved inter-layer prediction for macroblock mode, as well as motion vectors for the case of extended spatial scalability.