In current video coding schemes, such as H.264/AVC (Advanced Video Coding) and HEVC (High Efficiency Video Coding), motion information in inter-predicted pictures is partitioned into rectangular blocks of configurable size. While in H.264/AVC the motion is partitioned into symmetric blocks with sizes of maximum 16×16 pixels, which are called macroblocks and can be further subdivided down to a minimum of 4×4 pixels, HEVC replaces the macroblock with the coding tree unit (CTU) of maximum size 64×64 pixels. The CTU is not just a larger macroblock, since it can be partitioned using a quad-tree-decomposition scheme into smaller coding units (CU), which can then be subdivided down to a minimum size of 8×8 pixels. Furthermore, unlike in H.264/AVC, asymmetric block partitioning (AMP) of coding units into prediction units (PU) is supported in HEVC.
The block partitioning of HEVC is based purely on rectangular blocks. For arbitrary shaped moving objects, which can be typically expected in natural video sequences, this can lead to a very fine block partitioning along the object boundary. As motion vectors on either side of the boundary can be similar in direction and magnitude, a coding overhead is introduced. That is, additional side-information needs to be transmitted, in order to describe the fine block partitioning and redundant motion vectors.
This problem can be circumvented by applying a different block partitioning strategy. In video coding, the following methods of block partitioning can typically be distinguished: Rectangular block partitioning, Geometric block partitioning, and Object-based block partitioning.
Examples for these different partitioning methods are illustrated in FIG. 9, where a simple scenario of a moving foreground object and a moving background is visualized. The quad-tree-PU partitioning of HEVC, and the related quad-tree-binary-tree partitioning method are representatives of rectangular block partitioning. Geometric partitioning is achieved by splitting the block with a straight line into two segments, also called wedges in this context. Object-based partitioning is the most flexible way of block partitioning, as a block can be partitioned into arbitrary shaped segments.
More flexible block partitioning, however, leads to the following challenges: More side-information may be needed to signal the partitioning structure, in contrast to rectangular block partitioning. Additionally, determining the partitioning at the encoder often comes at a significant increase in complexity.
In the prior art, such as in HEVC, the determination of an optimal partitioning is an encoder task. Typically, a rate-distortion optimization is used to determine the partitioning in an exhaustive search. Further, the rate-distortion optimization is highly specific to a multitude of internal and external conditions, such as encoder implementation, target bitrate, quality, application scenario, etc.
The block partitioning in HEVC is also limited to rectangular partitioning of coding blocks. In detail, this means that a square-shaped coding block can be split into two rectangular prediction blocks, wherein each prediction block is associated with up to two motion vectors. As in AVC, a horizontal and vertical split into two equally sized rectangular blocks is specified. In extension of that, four asymmetric partitionings are specified for further flexibility. In total, eight partitioning modes are therefore specified in HEVC.
A simplified method of a temporal projection of motion is used for the coding of motion vectors. In Merge Mode, a merge candidate list is constructed from spatial and temporal neighboring motion vectors. For the spatial motion vectors, the motion vector field of the current picture is used, the motion vector field containing the motion vectors associated to the blocks of the current picture. Motion vectors sampled at specific positions around the current prediction block are added to the merge candidate list. For the temporal motion vectors, the motion vector field of a reference picture is used. Here, the motion vector field is sampled at two collocated positions, wherein the collocated positions are denoted C0 and C1, as shown in FIG. 10.
Under the assumption that the motion vector fields of the current picture and reference picture are highly correlated, and therefore do not change significantly, it can be expected that a motion predictor can be found at the positions C0 or C1 in the reference picture motion vector field.