Recently, the Video Coding Experts Group (VCEG) of the ITU Telecommunication Standardization Sector (ITU-T), a sector of the International Telecommunication Union (ITU), and the ISO/IEC MPEG (JTC 1/SC 29/WG 11), a standardization subcommittee of the Joint Technical Committee ISO/IEC JTC 1 of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC), published the H.265/High Efficiency Video Coding (HEVC) standard in 2013 (version 1). This standard was updated in 2014 to version 2, in 2015 to version 3, and in 2016 to version 4.
Since then these groups have studied the need for standardization of future video coding technology with a compression capability that significantly exceeds that of the HEVC standard and its updates. These groups have been working together in this effort in a joint collaboration known as the Joint Video Exploration Team (JVET) to evaluate compression technology designs proposed by experts in this field. A Joint Exploration Model (JEM) has been developed by JVET to explore video coding technologies beyond the capability of H.265 HEVC. The current and latest version of JEM is JEM-7.0.
In H.265 HEVC, a coding tree unit (CTU), which is the basic processing unit (logical unit) of the standard, is split into coding units (CUs) (also known as coding blocks) by way of a quad-tree structure denoted as a coding tree so as to adapt to various local characteristics.
The decision whether to code a picture area of a video using either inter-picture (temporal) or intra-picture (spatial) prediction is made at the CU level. Each CU can be further split into, for example, one, two, or four prediction units (PUs) according to a PU splitting type. For example, inside one PU, the same prediction process may be applied, and relevant information may be transmitted to a decoder on a PU basis.
After obtaining a residual block by applying the prediction process based on the PU splitting type, a CU can be partitioned into transform units (TUs) according to another quadtree structure similar to the coding tree for the CU.
One feature of the H.265 HEVC standard is that it includes multiple partition concepts including the CU, the PU, and the TU. However, in the signal-to-noise ratio (SNR) standard, a CU and a TU can only be square shaped, while a PU may be square shaped or rectangular shaped, at least for an inter-predicted block.
Recently, it has been proposed to allow rectangular shaped PUs for intra-prediction and transform thereof. However, this proposal was not adopted in the H.265 HEVC standard, but rather only was extended to be used in JEM.
In the H.265 HEVC standard, at the picture boundary of a picture area, implicit quad-tree splits are imposed so that a block will keep quad-tree splitting until the size fits the picture boundary.
In JEM however, a different type of splitting structure, a Quad-tree-Binary-Tree (QTBT), was developed which unifies the concepts of the CU, PU, and TU. This QTBT supports more flexibility for CU partition shapes. For example, according to a QTBT block structure, a CU can have either a square shape or rectangular shape. As is shown in FIGS. 1A and 1B, a CTU is first partitioned by a quadtree structure. Then quadtree leaf nodes thereof are further partitioned by a binary tree structure.
In the binary tree splitting there are two splitting types: symmetric horizontal splitting and symmetric vertical splitting. Here, the binary tree leaf nodes CUs, and segmentation thereof are used for prediction and transform processing without further partitioning. Thus, the CU, PU, and TU can have the same block size in the QTBT coding block structure.
In the JEM, a CU may consist of coding blocks CBs of different color components. For example, one CU may contain one luma coding block CB and two chroma coding blocks CBs, in the case of P and B slices of a 4:2:0 chroma format; and may consists of a single component, for example, one CU may contain only one luma coding block CB or just two chroma CBs, in the case of I slices.
Parameters of the afore-described QTBT partitioning scheme may be defined as follows:
CTU size: the root node size of a QTBT, (similar to H.265 HEVC);
MaxQTDepth: the maximum allowed quad-tree depth;
MinQTSize: the minimum allowed quadtree leaf node size;
MaxBTSize: the maximum allowed binary tree root node size;
MaxBTDepth: the maximum allowed binary tree depth;
MinBTSize: the minimum allowed binary tree leaf node size.
In one example of the QTBT partitioning structure, the CTU size is set as 128×128 luma samples with two corresponding 64×64 blocks of chroma samples; the MinQTSize is set as 16×16; the MaxBTSize is set as 64×64; the MinBTSize (for both height and width) is set as 4×4; and the MaxBTDepth is set as 4. The quadtree partitioning is applied to the CTU first to generate quadtree leaf nodes. The quadtree leaf nodes may have a size from 16×16 (i.e., the MinQTSize) to 128×128 (i.e., the CTU size). If the leaf quadtree node is 128×128, it will not be further split by the binary tree since the size exceeds the MaxBTSize (i.e., 64×64). Otherwise, the leaf quadtree node could be further partitioned by the binary tree. Therefore, the quadtree leaf node is also the root node for the binary tree which has a binary tree depth of 0. When the binary tree depth reaches MaxBTDepth (i.e., 4), no further splitting is considered. When the binary tree node has width equal to MinBTSize (i.e., 4), no further horizontal splitting is considered. Similarly, when the binary tree node has height equal to MinBTSize, no further vertical splitting is considered. The leaf nodes of the binary tree are further processed by prediction and transform processing without any further partitioning. In JEM, the maximum CTU size is 256×256 luma samples.
FIG. 1A illustrates an example of block partitioning using QTBT, and FIG. 1B illustrates the corresponding tree representation. In these Figures, the solid lines indicate quadtree splitting and the dotted lines indicate binary tree splitting. In each splitting node (i.e., non-leaf) of the binary tree, one flag is signaled to indicate which splitting type (i.e., horizontal or vertical) is used, where 0 indicates horizontal splitting and 1 indicates vertical splitting. For the quadtree splitting, there is no need to indicate the splitting type since quadtree splitting always splits a block both horizontally and vertically to produce 4 sub-blocks with an equal size.
In addition, the QTBT scheme supports the ability for the luma and chroma to have a separate QTBT structure. Currently, for P and B slices, in the QTBT scheme, the luma and chroma CTBs in one CTU share the same QTBT structure. However, for I slices, the luma CTB is partitioned into CUs by a QTBT structure, and the chroma CTBs are partitioned into chroma CUs by another QTBT structure. This means that a CU in an I slice consists of a coding block of the luma component or coding blocks of two chroma components, and a CU in a P or B slice consists of coding blocks of all three color components.
In H.265 HEVC, inter-prediction for small blocks is restricted to reduce the memory access of motion compensation, such that bi-prediction is not supported for 4×8 and 8×4 blocks, and inter-prediction is not supported for 4×4 blocks. In the QTBT of JEM, these restrictions are removed.
Next, a Multi-type-tree (MTT) structure will be described. A MTT is a more flexible tree structure than QTBT. In MTT, tree types other than quad-trees and binary-trees are supported. Horizontal and vertical center-side triple-trees are introduced, as shown in FIGS. 2D and 2E.
In a MTT scheme, there are two levels of trees, region trees (e.g. quad-trees) and prediction trees (e.g. binary-trees or triple-trees). Here, a CTU is first partitioned by a region tree (RT). A RT leaf may then be further split with a prediction tree (PT). A PT node may also be further split with a PT until a max PT depth is reached. After entering a PT, an RT (e.g. quad-tree) cannot be further used. Here, a PT leaf is the basic coding unit, but will still be referred to as a CU for convenience. This CU cannot be further split. Prediction and transform are both applied on the CU in the same way as JEM-3 or QTBT.
Some benefits of triple-tree partitioning are: (1) Complement to quad-tree and binary-tree partitioning. Here, triple-tree partitioning is able to capture objects which are located in a block center while quad-trees and binary-trees are always splitting along a block center. And (2) the height and width of the partitions of the proposed triple trees are always in a power of 2, so that no additional transforms are needed.
The design of a two-level tree may be motivated by complexity reduction. Thus, the complexity of traversing of a tree is TAD, where T denotes the number of split types, and D is the depth of tree. With the design of a two level tree and restriction of the first level is quad-tree only (reduce the number of T at certain levels), the complexity is reduced a lot while keeping a reasonable performance.
Next, Asymmetric Coding Units (ACU) in QTBT will be described. To further improve the coding efficiency on top of QTBT, an asymmetric binary tree has been proposed. As shown in FIG. 3, a CU with size S is divided into 2 sub-CUs with sizes S/4 and 3S/4, either in the horizontal or in the vertical direction. As an example, CU sizes of 12 and 24 are used. However, other sizes may also be used, such as 6 and 48.
One issue with ACUs in a QTBT scheme is that efficiency may be reduced if a width/height of a block is not in a power of 2. For example, transforms with a size such as 12 and 24 need to be supported. Accordingly, Special handling may also be needed when splitting a block with width/height being not a power of 2.
A Flexible tree structure will now be described. As described in U.S. Provisional Pat. No. 62/639,989, a flexible tree structure was proposed, in which a “split-to-square” scheme was introduced to replace quad-tree splits so as to handle more generic cases such as non-square CTUs.
Motion field information in H.265 HEVC will now be described. In H.265 HEVC, temporal motion vector prediction (TMVP) may be employed to improve the efficiency of motion vector prediction. That is, motion field of a reference frame may be stored in a decoded picture buffer (DPB) in addition to the reconstructed pixels of the reference frame.
Interpolated motion field will now be described. With a coding mode based on frame-rate up-conversion, interpolated (or extrapolated) motion field may be derived for the current frame before coding the current frame. Such derived motion information may be used in block level motion vector prediction or derivation.
Despite the afore-described advances in video coding, there exist problems in the current state of the art. For example, in advanced block partitioning methods, such as the multi-type tree, no motion information is considered during a tree split. This is not efficient since block partitioning typically correlates with motion fields, and pixels within one partition usually prefer a same motion.
Summary
According to an aspect of the disclosure, a method for encoding a video sequence comprises partitioning the video sequence into coding tree units, each coding tree unit including at least one coding tree block and each coding tree block including at least one coding block, determining the motion field of the at least one coding block, determining whether the motion field of the at least one coding block is homogenous or heterogeneous, and determining whether to signal a further partition of the at least one coding block based on the determination of whether the motion field of the at least one coding block is homogenous or heterogeneous.
According to an aspect of the disclosure, the method may also comprise partitioning the at least one coding block when it is determined that the motion field is heterogeneous.
According to an aspect of the disclosure, the method may also comprise not signaling in bit stream partitioning of the at least one coding block.
According to an aspect of the disclosure, the method may also comprise leaving as non-partitioned the at least one coding block when it is determined that the motion field is homogeneous.
According to an aspect of the disclosure, the method may also comprise not signaling in bit stream non-partitioning of the at least one coding block.
According to an aspect of the disclosure, the method may also comprise partitioning the at least one coding block using a tree type split such that sub-blocks obtained via the partitioning have the most homogeneous motion field among all available splitting types, wherein information on how to split the current block is derived (e.g. not signaled).
According to an aspect of the disclosure, the method may also comprise determining split types for partitioning the at least one coding block which result in relatively more heterogeneous motion fields than other available splitting types, and leaving as non-partitioned the at least one coding block by the split types which would result in relatively more heterogeneous motion fields so as to reduce signaling costs.
According to an aspect of the disclosure, the method may also comprise not checking split types for partitioning the at least one coding block which would lead to sub-blocks with more heterogeneous determined/derived motion fields than other split types for partitioning the at least one coding block.
According to an aspect of the disclosure, the method may also comprise not checking split types for partitioning the at least one coding block when it is determined that the determined/derived motion field is homogeneous.
According to an aspect of the disclosure, the method may also comprise determining whether to signal the further partition of the at least one coding block based on the condition that the at least one coding block is larger than a predetermined threshold.
According to an aspect of the disclosure, the method may also comprise determining/deriving the size of the coding tree units based on the motion field.
According to an aspect of the disclosure, the method may also comprise determining/deriving the maximum depth of a further partition of the at least one coding block based on a ranking of how homogeneous/heterogeneous the motion field is for the coding block.
According to an aspect of the disclosure, the method may also comprise, based on a ranking of how homogeneous/heterogeneous the determined/derived motion field is for a block region (e.g., 256×256, or 512×512), applying different CTU sizes within the block region.
According to an aspect of the disclosure, the afore-described methods may be applied differently for different block sizes.
According to an aspect of the disclosure, the method may also comprise determining a context-adaptive binary arithmetic coding (CABAC) context for entropy coding flags signaled for indicating the further partition of the at least one coding block.
According to an aspect of the disclosure, a method for encoding a video sequence comprises partitioning the video sequence into coding tree units, each coding tree unit including at least one coding tree block and each coding tree block including at least one coding block, determining/deriving the motion field of the at least one coding block, using information on the determined/derived motion field as a context or additional context in an entropy coding process when signaling a coding block split.
According to an aspect of the disclosure, the method may further comprise using an additional context-adaptive binary arithmetic coding (CABAC) context to signal coding block split information if the information on the determined/derived motion field is homogeneous.
According to an aspect of the disclosure, the method may also comprise using an additional CABAC context to signal coding block split information if the information on the determined/derived motion field is heterogeneous.
According to an aspect of the disclosure, the method may also comprise using additional contexts to signal coding block split information based on a ranking of how homogeneous/heterogeneous the determined/derived motion field is for the coding block.
According to an aspect of the disclosure a device for encoding a video sequence comprises: at least one memory configured to store program code; at least one processor configured to read the program code and operate as instructed by the program code, the program code including: first partitioning code configured to cause the at least one processor to partition the video sequence into coding tree units, each coding tree unit including at least one coding tree block and each coding tree block including at least one coding block, first determining code configured to cause the at least one processor to determine the motion field of the at least one coding block, second determining code configured to cause the at least one processor to determine whether the motion field of the at least one coding block is homogenous or heterogeneous, and third determining code configured to cause the at least one processor to determine whether to signal a further partition of the at least one coding block based on the determination of whether the motion field of the at least one coding block is homogenous or heterogeneous.
According to an aspect of the disclosure the device may also comprise second partitioning code configured to cause the at least one processor to partition the at least one coding block when it is determined that the motion field is heterogeneous.
According to an aspect of the disclosure the device may have the second partitioning code configured to not signal in bit stream partitioning of the at least one coding block.
According to an aspect of the disclosure the device may have the third determining code configured to signal a non-partitioning of the at least one coding block when it is determined that the motion field is homogeneous.
According to an aspect of the disclosure the device may also comprise second partitioning code configured to cause the at least one processor to partition the at least one coding block using a tree type split such that sub-blocks obtained via the second partitioning code have the most homogeneous motion field among all available splitting types, wherein information on how to split the current block is derived.
According to an aspect of the disclosure the device may also comprise fourth determining code configured to cause the at least one processor to determine split types for partitioning the at least one coding block which result in relatively more heterogeneous motion fields than other available splitting types, and second partitioning code configured to cause the at least one processor to leave as non-partitioned the at least one coding block by the split types which would result in relatively more heterogeneous motion fields so as to reduce signaling costs.
According to an aspect of the disclosure the device may also comprise fourth determining code configured to cause the at least one processor to determine whether to signal the further partition of the at least one coding block when the at least one coding block is larger than a predetermined threshold.
According to an aspect of the disclosure the device may also comprise fourth determining code configured to cause the at least one processor to determine the size of the coding tree units based on the motion field.
According to an aspect of the disclosure the device may also comprise first deriving code configured to cause the at least one processor to derive the maximum depth of a further partition of the at least one coding block based on a ranking of how homogeneous/heterogeneous the motion field is for the coding block.
According to an aspect of the disclosure a non-transitory computer-readable medium stores instructions, the instructions comprising: one or more instructions that, when executed by one or more processors of a device, cause the one or more processors to: partition a video sequence into coding tree units, each coding tree unit including at least one coding tree block and each coding tree block including at least one coding block, determine the motion field of the at least one coding block, determine whether the motion field of the at least one coding block is homogenous or heterogeneous, and determine whether to signal a further partition of the at least one coding block based on the determination of whether the motion field of the at least one coding block is homogenous or heterogeneous.
While the afore described methods, devices, and non-transitory computer-readable mediums have been described individually, these descriptions are not intended to suggest any limitation as to the scope of use or functionality thereof. Indeed these methods, devices, and non-transitory computer-readable mediums may be combined in other aspects of the disclosure.