In recent years, the demand for improving the resolution of videos has been increasing. Along with the increasing demand, coding techniques for improving the compression efficiency, such as an MPEG (Moving Picture Experts Group)-2, H.264/MPEG-4 AVC (hereinafter abbreviated as H.264), H.265/HEVC (High Efficiency Video Coding) (hereinafter abbreviated as H.265), have been developed. These coding systems have achieved a high coding efficiency by compressing information using inter-frame motion compensation prediction. FIG. 19 shows an example of a video coding apparatus based on the H.265 specification.
As shown in FIG. 19, the video coding apparatus based on the H.265 specification generally includes a motion compensation predictor 001, an orthogonal transformer 002, a quantizer 003, an encoder (an entropy coder) 004, an inverse quantizer 005, an inverse orthogonal transformer 006, an intra frame predictor 007, a motion estimator 008, a loop filter 009, and a frame buffer 010. ITU-T (International Telecommunication Union Telecommunication Standardization Sector) Recommendation H.265 “High efficiency video coding” in Non Patent Literature 1 discloses the details of processing contents based on the H.265 specification. Accordingly, the detailed description of the constituent elements thereof will be omitted.
Video coding processing based on the H.265 specification is performed on each block of 64×64 pixels at maximum which is called a CTB (Coding Tree Brock)). The motion compensation prediction defined in the H.265 specification is a technique as described below. An image of a reference frame is corrected in the motion compensation predictor 001 by using motion information between a coding target frame, which is input as an input image, and a coded reference frame stored in the frame buffer 010. Video information is compressed by coding only the information representing the motion between the frames as difference information between a corrected image (predicted image) and a current image to be coded.
A motion between frames is represented by a motion vector indicating the amount of movement. The processing for calculating the motion information between the reference frame and the coding target frame in the motion estimator 008 is referred to as motion estimation. In the motion estimation processing, it is important to calculate the motion information capable of minimizing the amount of information to be coded, while preventing an increase in the amount of noise generated in a decoded image. Accordingly, in the recent video coding apparatus, a technique called RD (Rate-Distortion) optimization has been widely used.
In the RD optimization technique, a rate-distortion cost which is expressed as J=D+λR is calculated for a large number of motion vector candidates, and a motion vector having a minimum rate-distortion cost is adopted as the motion vector. In this case, D represents the amount of distortion generated in a difference image; R represents the amount of code generated in the coding of motion information; and λ represents a weighting factor dependent on, for example, the complexity of an image. The motion information includes a difference between prediction vector information, which is described later, and a motion vector for the prediction vector information, and merge information.
Spatially or temporally neighboring blocks have a high correlation with motion information. Accordingly, in the H.265 specification, a code amount R which is necessary for motion information can be reduced by using AMVP (Adaptive Motion Vector Prediction) for adaptively selecting a predicted value (prediction vector) for a motion vector from among neighboring motion vectors, or a merge mode in which motion information is copied from neighboring blocks. Specific examples of “neighboring blocks” include spatially neighboring blocks (A0, A1, B0, B1, and B2) of a coding target block as shown in FIG. 20, and corresponding blocks in temporally neighboring frames. Blocks used for the merge mode or AMVP can be arbitrarily selected by the encoder from the list of neighboring blocks. FIG. 20 is a schematic diagram for explaining spatially neighboring blocks of the coding target block.
Since the CTB of the neighboring block A0 located in the lower left of FIG. 20 represents processing subsequent to the coding target CTB, the motion information on the neighboring block A0, i.e., the processing result, is referred to only during the coding of a sub-block in the CTB. To simplify the following description, the left neighboring block A1, the upper left neighboring block B2, the upper neighboring block B1, and the upper right neighboring block B0 are referenced as blocks to be subjected to coding processing.
In this case, the motion information on the neighboring blocks is the same as the information obtained in the decoder. Accordingly, the motion information can be transmitted by coding only the index of the neighboring block list, and thus the code amount R can be reduced. In the case of using the merge mode, the motion information includes a flag indicating the merge mode, and the index of a reference block. In cases other than the merge mode, the motion information includes the index of a block to be referenced by a prediction vector, and information on a difference between the prediction vector and a motion vector.
FIG. 21 shows a configuration example of the motion estimator 008 of the video coding apparatus shown in FIG. 19. The motion estimator 008 shown in FIG. 21 includes a motion vector search unit 020, an AMVP selection unit 021, a merge cost calculation unit 022, and a motion information determination unit 023. The motion vector search unit 020 performs block matching for a large number of motion vector candidates, and determines a provisional motion vector having a minimum cost. The AMVP selection unit 021 selects a prediction vector from the motion vectors of neighboring blocks so that the motion vector code amount R is minimized. The merge cost calculation unit 022 calculates a rate-distortion cost J of the merge mode using the motion information on the neighboring blocks. The motion information determination unit 023 determines whether or not to use the merge mode as final motion information.
In the case of using the merge mode, a merge vector obtained by copying the motion vectors of the neighboring blocks is used as the motion vector of the coding target block. In cases other than the merge mode, a provisional motion vector obtained as a result of motion search is used as the motion vector. The motion information determined in a certain block to be subjected to coding processing is used for AMVP selection and merge cost calculation for other blocks. The AMVP and the merge mode have an effect of greatly reducing the code amount. In order to obtain a high coding efficiency, it is important to appropriately use the AMVP and the merge mode.
The motion estimation processing for comparing the costs of a large number of vectors requires an extremely large amount of operation, and thus needs to be performed at a high speed. To achieve high-speed processing, parallel processing using a many-core processor, such as a GPU (Graphic Processing Unit), which includes a large number of processor cores, is especially promising.
An example of related parallel processing techniques is WPP (Wavefront Parallel Processing). A specific example of the WPP parallel processing is disclosed in “Video coding on multicore graphics processors” by Cheung et al. in Non Patent Literature 2. In the WPP parallel processing, as shown in FIG. 22, the blocks in the respective lines of the coding target frame that are each located at a position shifted leftward by two blocks from the block in the upper line are processed in parallel as coding target blocks. Thus, the processing results of the left neighboring block, the upper neighboring block, and the upper right neighboring block can be referenced. FIG. 22 shows a schematic diagram for explaining a specific example of the WPP parallel processing disclosed in Non Patent Literature 2.
Japanese Unexamined Patent Application Publication No. 2012-175424 “Coding Processing Apparatus and Coding Processing Method” in Patent Literature 1 discloses a technique in which an image is divided into a plurality of regions, and the regions in blocks adjacent to the boundary between the divided regions are processed in parallel using only the information on the blocks within the regions to which the blocks belong.