Demands for improvement in resolution of motion pictures are increasing. Along with the demands, encoding technique is also advancing. As examples of encoding technique, MPEG-2, H.264/MPEG-4 AVC (hereinafter, referred to as H.264), and H.265/HEVC (hereinafter, referred to as H.265) are listed in order of development. A newly developed encoding technique is improved in compression efficiency more than a previously developed encoding technique. Encoding methods used in the above-described encoding techniques compress information at higher encoding efficiency by using a motion compensation prediction between frames.
NPL 1 describes a content of processing based on the standards of H.265. In processing of motion picture encoding based on the standards of H.265, an image (picture) is divided into block units referred to as coding tree units (CTUs), and the CTUs generated by division are processed in raster-scan order (i.e. arrangement order from an upper left to a lower right in a screen). A maximum size of the CTUs is 64×64 pixels.
The CTUs are further quad-tree-divided into block units referred to as coding units (CUs). In motion picture encoding processing conforming to the standards of H.265, encoding processing is executed for each CU.
A CTU includes coding tree blocks (CTBs) of a brightness signal and a color-difference signal. A CU includes coding blocks (CBs) of a brightness signal and a color-difference signal.
A motion compensation prediction that compresses inter-frame information is a technique for compressing motion picture information by using the following method. In the motion compensation prediction, an image of the reference frame is corrected by using motion information between an encoding target frame and an encoded reference frame. Then, only difference information between the corrected image (i.e. a prediction image) and a current image of the encoding target frame and information representing a motion between the frames are encoded. The motion between the frames is represented by a motion vector representing a movement.
In motion picture encoding, it is important to determine prediction encoding information such as motion information in such a way as to minimize an amount of information to be encoded, while suppressing an amount of noise generated in a decode image. In a common motion picture encoding method, a technique referred to as RD optimization that determines prediction encoding information in such a way as to satisfy the condition described above is widely used.
In the RD optimization, a rate distortion cost represented by J=D+λR is calculated for each of a large number of motion vector candidates. D is a distortion amount generated in a difference image, R is an encode amount generated in encoding of motion information, and λ is a weight coefficient depending on complexity of an image and the like. A motion vector candidate having a rate distortion cost of a minimum value among the calculated rate distortion costs is employed as a motion vector.
In a motion picture, pieces of motion information of blocks (i.e. neighborhood blocks) located in a spatial neighborhood or pieces of motion information of blocks located in a temporal neighborhood have higher correlativity, compared with pieces of motion information of common blocks. In H.265, adaptive motion vector prediction (AMVP) in which a prediction value (e.g. a prediction vector) of a motion vector is adaptively selected from motion vectors of neighborhood blocks and a merge mode in which motion information is copied from motion information of a neighborhood block is used. Usage of AMVP and the merge mode reduces an encode amount R generated in encoding of motion information.
Spatial neighborhood blocks that are blocks, among neighborhood blocks, located in a spatial neighborhood of an encoding target block are illustrated in FIG. 16. FIG. 16 is an illustrative diagram illustrating spatial neighborhood blocks of an encoding target block.
The spatial neighborhood blocks are specifically blocks A, B, C, D, and E illustrated in FIG. 16. The neighborhood block also includes a temporal neighborhood block that is a block belonging to a temporal neighborhood frame of a frame to which an encoding target block belongs and being located at the same position as the encoding target block.
In the merge mode, only indexes of a list of motion information of neighborhood blocks is encoded, and the encoded indexes is transferred. An encoder can voluntarily select a block used in the merge mode or AMVP from the list of the neighborhood blocks. Motion information of the neighborhood blocks obtained in the encoder is the same as the information obtained in a decoder. In other words, in the merge mode, transfer of an encoded index is equivalent to transfer of motion information. In the merge mode, an encode amount generated in encoding of motion information is further reduced.
As described in NPL 1, in order to reduce an encode amount generated in encoding of motion information, an index in a merge mode is set not according to information fixed for each position of a block but according to an encoding mode of a neighborhood block. The merge mode is a technique that produces a large effect of reducing an encode amount. In order to achieve higher encoding efficiency, it is important to use a merge mode appropriately.
FIG. 17 is a block diagram illustrating a configuration example of a common motion picture encoding device. A motion picture encoding device 1000 illustrated in FIG. 17 includes a transformation/quantization unit 1100, a subtraction unit 1200, an encoding unit 1300, an inverse-transformation/inverse-quantization unit 1400, an addition unit 1210, a loop filter 1500, a frame buffer 1600, a prediction encoding information determining unit 1700, an in-screen predicting unit 1800, and a motion compensation predicting unit 1900.
The subtraction unit 1200 subtracts a prediction image that is input from the in-screen predicting unit 1800 or the motion compensation predicting unit 1900 from an image signal that is input from an outside. The subtraction unit 1200 sets an image obtained by subtraction of the prediction image as a difference image, and inputs the difference image to the transformation/quantization unit 1100.
The transformation/quantization unit 1100 performs orthogonal transformation on the input difference image, and quantizes generated transformation coefficients. The transformation/quantization unit 1100 inputs the quantized transformation coefficients to the encoding unit 1300.
The encoding unit 1300 executes lossless encoding such as variable-length encoding or arithmetic encoding for the input quantized transformation coefficients, and generates a bit stream. The generated bit stream is output from the motion picture encoding device 1000.
The quantized transformation coefficient is further input to the inverse-transformation/inverse-quantization unit 1400. The inverse-transformation/inverse-quantization unit 1400 executes inverse-quantization of the input quantized transformation coefficient, and executes inverse orthogonal transformation for the generated transformation coefficients. The inverse-transformation/inverse-quantization unit 1400 inputs information obtained by the inverse orthogonal transformation to the addition unit 1210.
The addition unit 1210 adds the information obtained by the inverse orthogonal transformation and the prediction image and generates a reconstruction image. The addition unit 1210 inputs the generated reconstruction image to the loop filter 1500.
The loop filter 1500 eliminates block distortion of the input reconstruction image. The loop filter 1500 accumulates the reconstruction image from which the block distortion is eliminated in the frame buffer 1600. The reconstruction image accumulated in the frame buffer 1600 is used as a reference image of another frame.
The prediction encoding information determining unit 1700 determines, by using the input image, the reconstruction image, and the reference image of another frame, which prediction mode of an in-screen prediction mode and a motion compensation prediction mode is to be used for predicting an input image. Further, the prediction encoding information determining unit 1700 determines prediction encoding information for the determined prediction mode.
A reconstruction image from which block distortion is not eliminated is input to the in-screen predicting unit 1800. The in-screen predicting unit 1800 executes in-screen prediction processing for the input reconstruction image and outputs an in-screen prediction image generated in the in-screen prediction processing.
The motion compensation predicting unit 1900 detects a position change of a corresponding image block of the input image with respect to an image block of the reconstruction image accumulated on the frame buffer 1600. The motion compensation predicting unit 1900 then obtains a motion vector equivalent to the detected position change. The motion compensation predicting unit 1900 executes motion compensation prediction processing by using the obtained motion vector and outputs a motion compensation prediction image generated in the motion compensation prediction processing.
The in-screen predicting unit 1800 and the motion compensation predicting unit 1900 each generate prediction images according to the determined content of the prediction encoding information determining unit 1700.
Next, a configuration example of the prediction encoding information determining unit 1700 that determines prediction encoding information including motion information is illustrated in FIG. 18. FIG. 18 is a block diagram illustrating a configuration example of a common prediction encoding information determining unit. The prediction encoding information determining unit 1700 illustrated in FIG. 18 includes a motion vector search unit 1710, a merge vector/merge index determining unit 1720, an in-screen prediction mode determining unit 1730, a prediction encoding mode determining unit 1740, and a prediction information buffer 1750.
The motion vector search unit 1710 includes a function of determining a tentative motion vector having a minimum cost from among a large number of motion vector candidates in a search range by executing block matching or the like. The block matching is a method of searching a motion vector having a minimum cost in a reference image by using a cost function such as sum of absolute difference (SAD).
The merge vector/merge index determining unit 1720 includes a function of determining a merge vector having a minimum cost from among a plurality of merge vector candidates derived from motion vectors of neighborhood blocks by executing block matching or the like. The merge vector/merge index determining unit 1720 includes a function of determining a merge index corresponding to the determined merge vector having a minimum cost.
The in-screen prediction mode determining unit 1730 includes a function of determining a mode having a minimum cost among a plurality of in-screen prediction modes.
The prediction encoding mode determining unit 1740 receives a tentative motion vector and a cost for the tentative motion vector output from the motion vector search unit 1710. The prediction encoding mode determining unit 1740 further receives merge information and a cost for the merge information output from the merge vector/merge index determining unit 1720. The merge information includes a merge vector and a merge index.
The prediction encoding mode determining unit 1740 receives in-screen mode information and a cost for the in-screen mode information output from the in-screen prediction mode determining unit 1730. The prediction encoding mode determining unit 1740 determines, on the basis of the input information, which prediction mode of a prediction vector mode in which a tentative motion vector is used, a merge mode, and an in-screen prediction mode is used as a prediction encoding mode for an encoding target block. The prediction vector mode and the merge mode are included in a motion compensation prediction mode.
The prediction information buffer 1750 includes a function of storing prediction encoding information such as determined motion information and the like. The prediction information buffer 1750 receives the determined prediction encoding information from the prediction encoding mode determining unit 1740.
A configuration example of the merge vector/merge index determining unit 1720 is illustrated in FIG. 19. FIG. 19 is a block diagram illustrating a configuration example of a common merge vector/merge index determining unit. The merge vector/merge index determining unit 1720 illustrated in FIG. 19 includes a merge vector candidate list generating unit 1721 and a merge vector/index selecting unit 1722.
The merge vector candidate list generating unit 1721 includes a function of generating a list of motion vectors that are candidates of a merge vector from stored motion vectors of neighborhood blocks in a motion information buffer (not illustrated).
The merge vector/index selecting unit 1722 calculates, on the basis of a current image and a reference image, an evaluation cost with respect to each merge vector candidate included in the list generated by the merge vector candidate list generating unit 1721. The merge vector/index selecting unit 1722 selects a merge vector for which a minimum calculated evaluation cost is calculated, and selects an index of the selected merge vector. The merge vector/index selecting unit 1722 outputs merge information including the selected merge vector and merge index and an evaluation cost for the selected merge vector.
A large computation amount is concerned with motion estimation processing in which costs for a large number of vectors are compared. In other words, it is necessary to accelerate motion estimation processing. It is highly possible to achieve acceleration of processing, for example, by causing a many core processor such as a graphic processing unit (GPU) including a large number of processor cores to execute motion estimation processing in parallel.
As one common parallel processing technique in a motion picture encoding technique, there is wavefront parallel processing (WPP). A specific example of WPP is described in NPL 2.
FIG. 20 is an illustrative diagram illustrating an example of WPP. As illustrated in FIG. 20, when a motion picture encoding device encodes image blocks of rows in parallel by using WPP, an encoding target block of a specific row is a block shifted to a two-block left side from an encoding target block of a one-upper row of the specific row. Therefore, when the encoding target block of the specific row is encoded, the motion picture encoding device can refer to processing results of a left block and an upper-right block.
PTL 1 describes an encoding processing device that divides an image into a plurality of regions, encodes a block in contact with a border of a region by using only information of blocks in the region to which the block in contact with the border belong, and thereby executes encoding processing in parallel for each of the regions.
The standards of H.265 described in NPL 1 adopts a function referred to as a parallel merge that improves parallelism of encoding processing. When the parallel merge is used, a CTU is divided into a plurality of square regions referred to as motion estimation regions (MERs).
In the parallel merge, when a block belonging to an MER is encoded, motion information of blocks belonging to the same MER is designated as motion information that is not referred to. Consequently, there is no dependence relation between the blocks, and therefore the blocks are processed in parallel. Therefore, when an MER is specified as a 16×16 size, an encoding device can encode in parallel, for example, four 8×8 size blocks or sixteen 4×4 size blocks belonging to the same MER.
PTL 2 and PTL 3 describe techniques in which when blocks belonging to a CU is encoded, reference to another block belonging to the same CU is prohibited, and thereby blocks belonging to the same CU are processed in parallel.