In current video coding schemes, such as H.264/AVC (“Advanced Video Coding”) or HEVC (“High Efficiency Video Coding”), the motion information in inter-predicted pictures (also referred to as frames) is partitioned into rectangular video coding blocks of configurable size. While in H.264/AVC the motion is partitioned into symmetric video coding blocks with a maximum size of 16×16 pixels, so-called macroblocks, which can be further subdivided down to a minimum of 4×4 pixels, HEVC replaces a macroblock with a coding tree unit (CTU) of maximum size 64×64 pixels. The CTU is not just a larger macroblock, as it can be partitioned in a quadtree (QT) decomposition scheme into smaller coding units (CU), which, in turn, can be subdivided down to a minimum size of 8×8 pixels. Furthermore, in comparison to H.264/AVC additionally HEVC supports asymmetric block partitioning (AMP) of coding units (CU) into prediction units (PU).
The determination of the decomposition and partitioning of each CTU is performed during the encoding process and is based on a rate-distortion optimization criterion. While AMP already provides an improved coding efficiency, problems in coding efficiency may arise along the boundaries of moving objects in a video sequence. Object boundaries that are not strictly vertical or horizontal may result in a fine quadtree decomposition and block partitioning along the object boundary. As the blocks along the boundary are expected to contain similar motion information, redundancy is introduced, which decreases the coding efficiency.
An attempt to address this problem is called geometric motion partitioning (GMP), which is based on the idea of partitioning a rectangular video coding block into two segments via a straight line, which can have practically any orientation. This approach provides for more flexibility in motion partitioning and therefore leads to a closer approximation of the actual motion. However, finding the optimal GMP of a video coding block in an exhaustive search, which greatly increases the computational complexity. Moreover, an efficient predictive coding scheme for the additional GMP information has to be provided.
In a more general and advanced partitioning approach, the video coding block containing an object boundary is partitioned into two (or more) segments along the actual object boundary, where the two or more segments carry coherent, yet different motion information. Due to the possible complexity of the shape of the boundary, coding the boundary and transmitting it as side information to the decoder is generally not an efficient option in terms of the data rate. This problem can be solved by determining the object boundary at the decoder (and encoder) side using already available information, e.g. from available reference pictures. Finding the correct object boundary is a typical problem in the field of image segmentation. Segmentation can be performed according to numerous image features, such as pixel luminance, chrominance, texture or a combination thereof.
In general, block based inter prediction methods used in today's common video codecs, such as AVC and HEVC, do not provide a consistent solution to address the problem of occlusions, which occurs when overlapping objects within a video sequence move into different directions. For instance, in a video sequence, where a foreground object moves to the lower right at high velocity, while a background object moves to the right at low velocity, new content will be uncovered at the left side of the boundary between foreground and background. Accordingly, at other regions in the video sequence, existing background will be covered by moving foreground objects.
In general, regions of the video sequence, where motion vectors of neighboring objects converge, typically result in a fine block partitioning along the object boundary. All blocks on one side of the boundary exhibit the same motion, thus introducing unnecessary overhead in terms of signaling this motion to the decoder. Furthermore, in regions where motion vectors diverge, new content is being uncovered, which cannot easily be predicted, neither through inter- nor intra-prediction methods.
U.S. Pat. No. 7,142,600 discloses a method of detecting occlusions and disocclusions based on k-means clustering, where an average motion vector of a suspect region is compared to a centroid motion vector and the difference between these two vectors is used in a subsequent threshold based decision process to detect the occlusion.
Although the approach described in U.S. Pat. No. 7,142,600 provides some improvements with respect to the handling of occlusions and disocclusions compared to other approaches, there is still a need for video coding devices and methods, which are based on segmentation-based partitioning for inter prediction of a video coding block and which provide an improved handling of occlusions and disocclusions.