Recent years, demand for increasing the resolution of video has been increasingly intensifying. To respond to such demand, in coding methods, such as MPEG-2, H.264/MPEG-4 AVC (hereinafter, H.264), and H.265/HEVC (hereinafter, H.265), the later the coding method appear, the more the compression efficiency.
These coding methods compress information by use of motion compensation prediction between frames to achieve a high coding efficiency. The motion compensation prediction is a technology for compressing video information by compensating an image in a reference frame with use of motion information between a coding target frame and a reference frame that has already been coded, and coding only difference information between the compensated image (predicted image) and a current image to be coded and the information representing a motion between the frames.
A motion between frames is represented by a motion vector that expresses a magnitude of displacement (see, for example, PTL 1). Processing of calculating motion information between the reference frame and the coding target frame is referred to as motion estimation. In the motion estimation, it is important to calculate motion information so as to, while suppressing the amount of noise generated in a decoded image, minimize the amount of information to be coded. Therefore, a method referred to as RD optimization (Rate-Distortion Optimization) has been popularly used in recent video coding devices.
In general, in the RD optimization, a lot of motion vector candidates are assumed with respect to a block on which attention is focused, and a Rate-Distortion cost (RD cost), expressed by J=D+λR, is calculated for each of the motion vector candidates. Subsequently, a motion vector candidate that minimizes the RD cost is chosen as the motion vector of the block on which attention is focused. It is assumed that processing of calculating an RD cost for each motion vector candidate to generate a motion vector of a block on which attention is focused in the manner described above is referred to as motion vector search. In the above equation, D, R, and λ denote the amount of distortion generated in a difference image, the amount of code produced in coding of motion information, and a weighting factor that depends on the complexity of an image and the like, respectively. Motion information includes motion vector information. In the motion vector search, because of calculating an RD cost with respect to each of a lot of motion vector candidates, the amount of computation increases substantially.
In NPL 1, details of processing based on the H.265 standard are described. Processing in accordance with the H.265 video coding standard is performed in units of block of maximally 64×64 pixels, which is referred to as a CU (Coding Unit). In H.265/HEVC, using variable CU size has made it possible to compress the amount of code efficiently. An optimal CU size is selected out of the CUs of 32×32, 16×16, and 8×8 pixel sizes, into which, as illustrated in FIG. 7, a 64×64 block, which has the largest size, is partitioned hierarchically by means of quad-tree segmentation.
In the H.265 standard, motion estimation is performed by partitioning a CU into a PU(s) (Prediction Unit). A PU is a unit for retaining motion information. As illustrated in FIG. 8, when the size of a CU is assumed to be 2N×2N, the PU is partitioned into a PU(s) that has/have an optimal block size out of 2N×2N, 2N×N, N×2N, and N×N. It is assumed that the above four types of partitioning are collectively referred to as PU partitioning. Making the sizes of CU and PU variable as described above and coding a level portion in a large block enable the amount of code of motion information to be reduced. On the other hand, coding a portion including small variation in a small block enables the amount of distortion in a difference image to be reduced. With this configuration, it is possible to reduce an RD cost J=D+λR, which is defined by an amount R of code produced by coding of motion information and an amount D of distortion generated in a difference image.
A simple method for selecting optimal CU partitioning and PU partitioning is as follows.
(A1) A 64×64 size block is assumed as a CU. It is assumed that the following processing of steps (A2) to (A6) are referred to as processing at a CU layer depth of 1 (depth=1).
(A2) With respect to the block assumed to be a CU, four types of PU partitioning illustrated in FIG. 8 are supposed.
(A3) In each of the four types of PU partitioning supposed in (A2), a motion vector search is performed for each PU. The size of a CU is assumed to be 2N×2N. In this case, a motion vector of a PU into which the CU is assumed to be transformed as it is, that is, of a 2N×2N size PU, is obtained. Motion vectors of respective upper and lower PUs into which the CU is assumed to be halved vertically, that is, two 2N×N PUs, are also obtained. Motion vectors of respective right and left PUs into which the CU is assumed to be halved horizontally, that is, two N×2N PUs, are also obtained. Motion vectors of respective four N×N PUs into which the CU is assumed to be partitioned by means of quad-tree segmentation are also obtained.(A4) Based on the motion vectors of the respective PUs obtained in (A3), the total of RD costs for each type of PU partitioning is obtained. Since an RD cost is obtained for the 2N×2N size PU, the RD cost becomes the total as it is. It is assumed that the RD cost is denoted by J (2N×2N). Since two RD costs are obtained for the 2N×N size PUs, both RD costs are totaled. It is assumed that the total is denoted by J (2N×N). The same applies to the N×2N size PUs, and it is assumed that the total of two RD costs therefor is denoted by J (N×2N). Since four RD costs are obtained for the N×N size PUs, the four RD costs are totaled. It is assumed that the total is denoted by J (N×N).(A5) A type of PU partitioning that minimizes the total of RD costs obtained in (A4) is obtained. That is, a type of PU partitioning that corresponds to the smallest total RD cost among J(2N×2N), J(2N×N), J (N×2N), and J (N×N) is obtained.(A6) Four 32×32 size blocks into which the 64×64 size block is partitioned by means of quad-tree segmentation are considered, and each of the blocks is assumed to be a CU. For each of the four 32×32 size blocks, the processing of steps (A2) to (A5) are performed. It is assumed that this processing is referred to as processing at a CU layer depth of 2 (depth=2).(A7) Four 32×32 size blocks into which the 64×64 size block is partitioned by means of quad-tree segmentation are considered, and the 32×32 size blocks are further partitioned by means of quad-tree segmentation. That is, each of, in total, 16 blocks having 16×16 size that are obtained by partitioning the 64×64 size block by means of two stages of quad-tree segmentation is assumed to be a CU. For each of the 16 blocks having 16×16 size, the processing of steps (A2) to (A5) are performed. It is assumed that this processing is referred to as processing at a CU layer depth of 3 (depth=3).(A8) Each of, in total, 64 blocks having 8×8 size that are obtained by partitioning the 64×64 size block by means of three stages of quad-tree segmentation is assumed to be a CU, and the processing steps (A2) to (A5) are performed. It is assumed that this processing is referred to as processing at a CU layer depth of 4 (depth=4).(A9) Based on the results from the processing of steps (A1) to (A8), CU partitioning that minimizes the total of RD costs over the whole of the 64×64 block is obtained.
Selection of a CU size and a PU size substantially influences image quality. On the other hand, performing motion vector search processing corresponding to a lot of CU sizes and PU sizes one by one causes the amount of computation to increase substantially. In particular, in implementing a practical encoder for high-definition video such as 4K, it is not practical to perform motion vector searches with all the CU sizes and PU sizes. Since, in H.265, the number of selectable types of block sizes has increased to 13 from seven types of block sizes in H.264, the amount of computation has further increased.
A technology for suppressing such an increase in the amount of computation is described in NPL 2. In MVM (Motion Vector Merging) described in NPL 2, pieces of motion information are first obtained for small size blocks by means of motion vector search. Next, pieces of motion information are compared between adjacent blocks, and, when the compared pieces of motion information coincide with each other, the adjacent blocks are merged and the merged block is considered as a large block. This operation enables the amount of computation to be suppressed.
In more detail, motion vector searches with respect to N×N size PUs is first performed. As illustrated in FIG. 9, when three motion vectors out of the motion vectors of four N×N size blocks are the same, the four N×N blocks are merged into a 2N×2N size block. When pieces of motion information of two blocks adjacent to each other laterally are the same, the four N×N blocks are merged into two 2N×N blocks. When pieces of motion information of two blocks adjacent to each other longitudinally are the same, the four N×N blocks are merged into two N×2N blocks.
As described above, when pieces of motion information are mutually the same between adjacent blocks, merging of the adjacent blocks is determined and the motion information is regarded to be the motion information of a block into which the adjacent blocks are merged. Thus, motion vector searches with respect to 2N×2N, 2N×N, and N×2N size blocks in step (A3) of the above-described method do not need to be performed, and a motion vector search is performed only with respect to N×N size blocks.
Since merging of blocks is determined depending on a comparison of motion information between adjacent blocks, it is not needed to perform steps (A4) and (A5) in the above-described method. That is because a proper type of PU partitioning can be selected based on only comparison of motion information without comparing the totals of RD costs for respective types of PU partitioning.
Referring to a flowchart in FIG. 11, block size determination processing disclosed in NPL 2 will be described. “block” denotes a variable that indicates a block for which coding is to be performed. In addition, “depth” denotes a natural number not less than 1 and not greater than 4 that indicates a CU layer depth and an initial value thereof is 1. Moreover, “2N” denotes a block size and equal to any of 64, 32, 16, and 8, and an initial value thereof is 64. As will be described later, the block size determination processing is processing that includes a recursive call.
In step S1101, whether “depth” is maximum, that is, whether “depth” indicates a layer of smallest size blocks, is determined. If depth=4 is given, the result of the determination is true, and, if depth=1, 2, or 3 is given, the result of the determination is false.
When the determination result is true, that is, “depth” indicates the layer of smallest size blocks, the process proceeds to step S1103, which will be described later.
On the other hand, if the result of determination in step S1101 is false, that is, “depth” does not indicate the layer of smallest size blocks, block size determination processing is performed on each of four sub-blocks, “subblock[1]”, “subblock[2]”, “subblock[3]”, and “subblock[4]”, that are created by partitioning a block on which attention is focused by means of quad-tree segmentation (step S1102). The block size determination processing is performed as a recursive call that calls the block size determination processing itself. In other words, when a CU layer depth is 1, 2, or 3 (depth=1, 2, 3), the recursive call that calls the block size processing itself is performed. Since a sub-block “subblock[i]” comes to have a CU layer depth deeper by one than that of the block on which attention is currently focused, +1 is added to the argument “depth” in the block size determination processing having been called recursively. While, as described above, an argument indicating the size of the block on which attention is currently focused is “2N”, an argument indicating the size of the sub-block “subblock[i]” is “N”.
In step S1103, a motion vector search is performed on each of the sub-blocks “subblock[1]”, “subblock[2]”, “subblock[3]”, and “subblock[4]”, into which the block “block” is partitioned by means of quad-tree segmentation. When the determination in step S1101 results in true and the process has proceeded from step S1101, that is, when depth=4 is given and the block size is 8×8, a motion vector search is performed on each of 4×4 blocks that are sub-block thereof.
In step S1104, based on the motion information of the respective sub-blocks that are obtained in step S1103, processing of block merging is performed and a best block size “bestPart” is determined. Although details will be described later, the block merging processing returns any of 2N×2N, 2N×N, N×2N, and N×N as the best block size “bestPart”.
In step S1105, comparison in magnitude between an RD cost “bestPartCost” at the best block size “bestPart” and a cost that has been best “minCost” is performed. When the RD cost “bestPartCost” is smaller than the best cost “minCost” (y in step S1105), the best cost “minCost” is updated with the value of the RD cost “bestPartCost” and a best depth “bestDepth” is updated with the value of the CU layer depth at which the block size determination processing is currently being performed (step S1106).
Next, an operation of the block merging processing in step S1104 will be described with reference to FIG. 12. In the block merging processing, four blocks “block” are taken as an argument. The blocks indicated by the argument “block” here is, however, equivalent to “subblock[i]” illustrated in the flowchart in FIG. 11.
In step S1002, based on motion information having been obtained in advance with respect to four blocks specified as the argument, whether the blocks can be merged into a 2N×2N size block is determined. When determined the blocks can be merged (y in step S1002), 2N×2N is returned as the best block size “bestPart”. In NPL 2, a condition for merging the four blocks into a 2N×2N block is defined such that the motion vectors of three blocks out of the four blocks are the same.
In step S1003, based on the motion information having been obtained in advance with respect to the four blocks specified as the argument, whether the blocks can be merged into 2N×N size blocks is determined (y in step S1003). When determined the blocks can be merged, 2N×N is returned as the best block size “bestPart”. In NPL 2, a condition for merging into 2N×N blocks is defined such that, when two blocks laterally adjacent to each other are considered as a pair, the motion vectors of the blocks composing the pair are the same.
In step S1004, based on the motion information having been obtained in advance with respect to the four blocks specified as the argument, whether the blocks can be merged into N×2N size blocks is determined. When determined the blocks can be merged (y in step S1004), N×2N is returned as the best block size “bestPart”. In NPL 2, a condition for merging the four blocks into N×2N blocks is defined such that, when two blocks adjacent to each other longitudinally are considered as a pair, the motion vectors of the blocks composing the pair are the same.
When determined the blocks do not apply to any of merging conditions in steps S1002, S1003, and S1004, N×N is returned as the best block size “bestPart”.
In the block size determination processing, a motion vector search is performed at the following timings.
First, the block size determination processing at a depth of 4 (depth=4) is performed. In this case, in step S1103, motion vector searches targeting a 8×8 size block 1 (FIG. 14), a 8×8 size block 2 (FIG. 15), a 8×8 size block 3 (FIG. 16), and a 8×8 size block 4 (FIG. 17) are performed. Sub-blocks targeted by the motion vector searches at a depth of 4 (depth=4) are 4×4 size blocks.
Next, targeting a 16×16 size block made up of the 8×8 size blocks 1 to 4 (FIG. 18), the block size determination processing at a depth of 3 (depth=3) is performed. In this case, in step S1103, motion vector searches are performed considering the 8×8 size blocks as sub-blocks.
As with the block size determination processing at a depth of 4 (depth=4) targeting the 8×8 size blocks 1 to 4, the block size determination processing at a depth of 4 (depth=4) targeting each of 8×8 size blocks 5 to 8 is performed. Further, block size determination processing at a depth of 3 (depth=3) targeting a 16×16 size block made up of the 8×8 size blocks 5 to 8 (FIG. 19) is performed. The block size determination processing at a depth of 4 (depth=4) is also performed with respect to each of 8×8 size blocks 9 to 12, and the block size determination processing at a depth of 3 (depth=3) targeting a 16×16 size block made up of the 8×8 size blocks 9 to 12 (FIG. 20) is performed. Furthermore, the block size determination processing at a depth of 4 (depth=4) targeting each of 8×8 size blocks 13 to 16 is performed, and the block size determination processing at a depth of 3 (depth=3) targeting a 16×16 size block made up of the 8×8 size blocks 13 to 16 (FIG. 21) is performed.
Next, the block size determination processing at a depth of 2 (depth=2) targeting a 32×32 size block made up of the 8×8 size blocks 1 to 16 (FIG. 22) is performed. Sub-blocks with respect to which the motion vector search at a depth of 2 (depth=2) in step S1103 is performed are 16×16 size blocks.
The above-described block size determination processing at depths of 4, 3, and 2 (depth=4, 3, 2) targeting the 8×8 size blocks 1 to 16 is also performed targeting each of 8×8 size blocks 17 to 32, 8×8 size blocks 33 to 48, and 8×8 size blocks 49 to 64.
Last, the block size determination processing at a depth of 1 (depth=1) targeting a 64×64 size block made up of all the blocks, that is, the 8×8 size blocks 1 to 64, is performed. Sub-blocks with respect to which the motion vector search at a depth of 1 (depth=1) is performed are 32×32 size blocks.
As described above, the block size determination processing disclosed in NPL 2 includes a step (step S1103) in which a motion vector search is performed with respect to each of the sub-blocks at a CU layer depth “depth” on which attention is currently focused. Motion vector searches are performed targeting 4×4 size blocks, 8×8 size blocks, 16×16 size blocks, and 32×32 size blocks at depths of 4, 3, 2, and 1 (depth=4, 3, 2, and 1), respectively.