A Multi-view Video Coding (MVC) sequence is a set of two or more video sequences that capture the same scene from a different view point. A possible approach to the encoding of a multi-view video sequence is to encode each single view independently. In this case, any existing video coding standard, as for example, the International Telecommunication Union, Telecommunication Sector (ITU-T) H.263 recommendation (hereinafter the “H.263 Recommendation”) and the International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) Moving Picture Experts Group-4 (MPEG-4) Part 10 Advanced Video Coding (AVC) standard/International Telecommunication Union, Telecommunication Sector (ITU-T) H.264 recommendation (hereinafter the “MPEG-4 AVC standard”) can be used. The approach has low compression efficiency since it only exploits the temporal redundancy between pictures of the same video sequence.
The Reduced-Resolution Update mode was introduced in the H.263 Recommendation to allow an increase in the coding picture rate while maintaining sufficient subjective quality. Although the syntax of a bitstream encoded in this mode was essentially identical to a bitstream coded in full resolution, the main difference was on how all modes within the bitstream were interpreted, and how the residual information was considered and added after motion compensation. More specifically, an image in this mode had ¼ the number of macroblocks compared to a full resolution coded picture, while motion vector data was associated with block sizes of 32×32 and 16×16 of the full resolution picture instead of 16×16 and 8×8, respectively. On the other hand, discrete cosine transform (DCT) and texture data are associated with 8×8 blocks of a reduced resolution image, while an upsampling process is required in order to generate the final full image representation.
Although this process could result in a reduction in objective quality, this is more than compensated from the reduction of bits that need to be encoded due to the reduced number (by 4) of modes, motion data, and residuals. This is especially important at very low bit rates where modes and motion data can be considerably more than the residual. Subjective quality was also far less impaired compared to objective quality. Also, this process can be seen as somewhat similar to the application of a low pass filter on the residual data prior to encoding, which, however, requires the transmission of all modes, motion data, and filtered residuals, thus being less efficient.
Some notable differences of the RRU mode compared to normal encoding are the consideration of larger block sizes and the subsampling of the residual prior to encoding. The first difference allows for a significant overhead reduction within the bitstream (critical for lower bit rates), while the second difference can be seen as a “spatial” quantization process.
More specifically, to support RRU within the syntax of the MPEG-4 AVC standard, a new slice parameter (reduced_resolution_update) was introduced according to which the current slice is subdivided into (RRUwidth*16)×(RRUheight*16) size macroblocks. Unlike the H.263 Recommendation, it is not necessary for RRUwidth to be equal to RRUheight. Additional slice parameters can be included, more specifically rru_width_scale=RRUwidth and rru_height_scale=RRUheight which allow us to reduce resolution horizontally or vertically at any desired ratio. Possible options, for example, include scaling by 1 horizontally & 2 vertically (MBs are of size 16×32), 2 vertically & 1 horizontally (MB size 32×16), or, in general, to have macroblocks of size (rru_width_scale*16)×(rru_height_scale*16).
In a special case, for example, RRUwidth=RRUheight=2 and the RRU slice macroblocks will then be of size 32×32. In this case, all macroblock partitions and sub-partitions have to be scaled by 2 horizontally and 2 vertically. Turning to FIG. 1, a diagram for exemplary macroblock partitions 100 and sub-macroblock partitions 150 in a Reduced Resolution Update (RRU) mode is indicated generally by the reference numeral 100. Unlike the H.263 Recommendation, where motion vector data had to be divided by 2 to conform to the standards specifics, this is not necessary in the MPEG-4 AVC standard and motion vector data can be coded in full resolution/subpel accuracy. Skipped macroblocks in P slices in this mode are considered as of having a 32×32 size, while the process for computing their associated motion data remains unchanged, although obviously we need to now consider 32×32 neighbors instead of 16×16.
Another key difference of this extension, although optional, is that in the MPEG-4 AVC standard, texture data does not have to represent information from a lower resolution image. Since intra coding in the MPEG-4 AVC standard is performed through the consideration of spatial prediction methods using either 4×4 or 16×16 block sizes, this can be extended, similarly to inter prediction modes, to 8×8 and 32×32 intra prediction block sizes. Prediction modes nevertheless remain more or less the same, although now more samples are used to generate the prediction signal.
The residual data is then downsampled and is coded using the same transform and quantization process already available in the MPEG-4 AVC standard. The same process is applied for both Luma and Chroma samples. During decoding, the residual data needs to be upsampled. The downsampling process is done only in the encoder and, hence, does not need to be standardized. The upsampling process must be matched in the encoder and the decoder, and so must be standardized. Possible upsampling methods that could be used are the zero or first order hold or by considering a similar strategy as in the H.263 Recommendation.
The MPEG-4 AVC standard also considers an in-loop deblocking filter, applied to 4×4 block edges. Since currently the prediction process is applied to block sizes of 8×8 and above, this process is modified to consider 8×8 block edges instead.
Different slices in the same picture may have different values of reduced_resolution_update, rru_width_scale and rru_height_scale. Since the in-loop deblocking filter is applied across slice boundaries, blocks on either side of the slice boundary may have been coded at different resolutions. In this case, we need to consider for the deblocking filter parameters computation, the largest quantization parameter (QP) value among the two neighboring 4×4 normal blocks on a given 8×8 edge, while the strength of the deblocking is now based on the total number of non-zero coefficients of the two blocks.
To support Flexible Macroblock Ordering as indicated by num_slice_groups_minus1 greater than 0 in the picture parameter sets, with the Reduced Resolution Update mode, an additional parameter referred to as reduced_resolution_update_enable is transmitted in the picture parameter set. It is not allowed to encode a slice using the Reduced Resolution Mode if FMO is present and this parameter is not set. Furthermore, if this parameter is set, the parameters rru_max_width_scale and rru_max_height_scale should also be transmitted. These parameters ensure that the map provided can always support all possible Reduced Resolution Update macroblock sizes. This means that the following parameters should conform to the following conditions:max_width_scale % rru_width_scale=0,max_height_scale % rru_height_scale=0 and,max_width_scale>0,max_height_scale>0.
The FMO slice group map that is transmitted corresponds to the lowest allowed reduced resolution, corresponding to rru_max_width_scale and rru_max_height_scale. Note that if multiple macroblock resolutions are used then rru_max_width_scale and rru_max_height_scale need to be multiples of the least common multiple of all possible resolutions within the same picture.
Direct modes in the MPEG-4 AVC standard are affected depending on whether the current slice is in reduced resolution mode, or the list1 reference is in reduced resolution mode and the current one is not. For the direct mode case, when the current picture is in reduced resolution and the reference picture is in full resolution, a similar method is borrowed from that is currently employed within the MPEG-4 AVC standard when direct_8×8_inference_flag is enabled. According to this method, co-located partitions are assigned by considering only the corresponding corner 4×4 blocks (corner is based on block indices) of an 8×8 partition. In our case if direct belongs to a reduced resolution slice, motion vectors and references for the co-located partitions are derived as if direct_8×8_inference_flag was set to 1. This can be seen also as a downsampling of the motion field of the co-located reference. Although not necessary, if direct_8×8_inference_flag was already set within the bitstream, this process could be applied twice. For the case when the current slice is not in reduced resolution mode, but its first list1 reference is, all motion data of this reduced resolution reference is to be first upsampled. Motion data can be upsampled using zero order hold, which is the method with the least complexity. Other filtering methods, for example similar to the process used for the upsampling of the residual data, or first order hold, could also be used.
Some other tools of the MPEG-4 AVC standard are also affected due to the consideration of this mode. More specifically, macroblock adaptive field frame mode (MB-AFF) needs to be now considered using a 32×64 super-macroblock structure. The upsampling process is performed on individual coded block residuals. If an entire picture is coded in field mode, then the corresponding block residuals are coded in field mode and, hence, the upsampling is also done in fields. Similarly, when MB-AFF is used, individual blocks are coded either in field of frame mode, and their corresponding residuals are upsampled in field or frame mode respectively.
To allow the reduced resolution mode to work for all possible resolutions, a picture is always extended vertically and horizontally in order to be always divisible by 16*rru_height_scale and 16*rru_width_scale, respectively. For the example where rru_height_scale=rru_width_scale=2, the original resolution of an image was HR×VR and the image is padded to a resolution equal to HC×VC where:HC=((HR+31)/32)*32VC=((VR+31)/32)*32
The process for extending the image resolution is similar to what is currently done for the MPEG-4 AVC standard to extend the picture size to be divisible by 16.
A similar approach is used for extending chroma samples, but to half of the size.