In recent years, an image size expansion request of a video image is rising year by year, and in connection with that, a coding technology has progressed so that compression efficiency may also improve like MPEG-2, MPEG-4, and H.264/MPEG-4AVC (hereinafter, H.264). Incidentally in the above, MPEG is an abbreviation for Moving Picture Experts Group. AVC is an abbreviation for Advanced Video Coding. These coding methods have achieved high encoding efficiency by compressing information using inter-frame motion compensation. For example, the contents of the process based on the standard of H.264 are disclosed in non-patent document 1. And the details of a video image encoding device based on the standard of H.264 are disclosed in non-patent document 2.
Here, the motion compensation is a technology which compresses video image information by the following method. First, an estimated image which performed motion compensation to an image of a reference frame is generated using motion information between a coding object frame and the reference frame. And only a difference between the coding object frame and the estimated image, and motion information called a motion vector are coded.
For example, the process of the video image coding in H.264 including motion compensation is performed by a 16×16 pixel macro block unit. A process which calculates motion information is called motion estimation, and it searches a block with high similarity to the coding object block from the reference frame for every block of 16×16 pixels or 8×8 pixels in a macro block. The motion vector represents the difference of the positions between the block with the highest similarity in the reference frame and the coding object block.
And adjacent motion vectors have high correlation with each other. Accordingly, the code amount of the motion vector can also be reduced by calculating a predicted motion vector from the motion vector of the adjacent block which is already processed, and coding only a difference between the predicted motion vector and the motion vector. A rate-distortion optimization method for searching a motion vector with good coding efficiency is disclosed in non-patent document 3.
FIG. 30 is a block diagram showing a configuration of a video image encoding device 5000 described in non-patent document 1. FIG. 31 is a flow chart showing operation of the video image encoding device 5000.
First, a motion estimating unit 50110 of the video image encoding device 5000 calculates a predicted motion vector PMV of a coding object block from a motion vector of adjacent blocks (Step S102). And motion vector search which optimized the rate distortion using PMV is performed (Step S103). A motion compensation unit 5020 generates an estimated image using the motion vector.
Because the recent years' coding method such as H.264 has much computational complexity, improvement in the speed is attained by parallel processing in many cases. There is parallelization of a block unit as one of parallelization methods of motion estimation processing. Motion estimation is mostly independent for every block, and easy to parallelize. However, because a calculation of a predicted motion vector uses a processing result of the adjacent blocks, restrictions occur in processing order.
As shown in non-patent document 2, a median of a motion vector in blocks A, B and C is employed as a predicted motion vector of a block X shown in FIG. 32 by motion estimation in H.264. the sum of the code amount of the difference of a predicted motion vector and each candidate vector (vector cost), and the evaluation value of degree of similarity, such as a difference absolute value sum of the image block which performed motion compensation by each candidate vector, and a coding object block, is defined as a cost in motion vector search. And in the motion vector search, a vector by which the cost becomes smallest is searched. Because only the difference of a predicted motion vector and a motion vector, and the difference of an estimated image and an original image are coded at the time of coding, encoding efficiency improves by vector prediction.
As mentioned above, when using the predicted motion vector, in order to obtain accurate vector cost at the time of motion estimation, if the process in blocks A, B and C is not completed, and the motion vector is not determined, the motion estimation of the coding object block X cannot be started. It is disclosed in non-patent document 4 an example for performing parallel processing so that this restriction may be satisfied.
FIG. 34 is a block diagram showing a configuration of a parallel motion estimation device 700 described in non-patent document 4.
A motion vector which a motion vector searching unit 112 determined is stored in a motion vector buffer 120, and a predicted motion vector calculation unit 711 calculates a predicted motion vector using a motion vector of other blocks stored in the motion vector buffer 120. FIG. 33 shows the parallel processing order in the parallel motion estimation device 700 described in non-patent document 4. In FIG. 33, a block to which the identical number is given is the block which can be processed in parallel.
On the other hand, an example of a parallel motion estimation device which does not perform a vector prediction is disclosed in non-patent document 5. FIG. 35 is a block diagram showing a configuration of a parallel motion estimation device 500 described in non-patent document 5. This parallel motion estimation device 500 differs from the parallel motion estimation device 700 shown in FIG. 34 in a point that a motion estimating unit 510 is not provided with the predicted motion vector calculation units 711. When motion vector search is performed without performing a vector prediction like non-patent document 5, there is not a dependence relationship between the blocks, and all blocks can be processed in parallel.
It is disclosed in patent document 1 a parallel video image encoding device for processing by using the processing result of the neighboring blocks when the processing result of blocks A, B and C of FIG. 32 is not determined. FIG. 36 is a block diagram showing a configuration of a parallel motion estimation device 600 described in patent document 1. The parallel motion estimation device 600 has a motion estimating unit 610.
The motion estimating unit 610 includes a predicted motion vector calculation unit 61, a motion vector searching unit 62, a pseudo predicted motion vector calculating unit 63, a direct mode and skip mode cost calculation unit 64 and a mode judgment unit 65.
When a motion vector in blocks A, B and C used for calculating a predicted motion vector is not determined, operation of the motion estimating unit 610 is as follows. First, a pseudo predicted motion vector is calculated using the neighborhood block. Next, the cost of the direct mode or skip mode is calculated using this calculated pseudo predicted motion vector. The motion vector searching unit 62 searches a motion vector without using a predicted motion vector.
The mode judgment unit 65 compares the cost in each mode, and outputs the result of the judgment. Each processing in the motion estimating unit 610 operates in parallel by pipelining. FIG. 37 is a flow chart illustrating operation of this parallel motion estimation device 600. The motion vector searching unit 62 searches a motion vector without using a predicted motion vector (Step S301). The pseudo predicted motion vector calculating unit 63 calculates a pseudo predicted motion vector PMVx from the neighboring blocks in a specified block (Step S302). The direct mode and skip mode cost calculation unit 64 evaluates the cost of the direct mode and the skip mode using the pseudo predicted motion vector PMVx (Step S303). The mode judgment unit 65 waits that a motion vector of the specified blocks A, B and C is determined (Step S304) and calculates a predicted motion vector PMV from the motion vector of the specified block (Step S305). When PMVx and PMV are not equal, the result calculated in Step S303 is discarded (Step S308). When PMVx and PMV are equal, a mode is determined (Step S309) using the result calculated in Step S303 (Step S307).
The video image coding of H.264 is premised on sequential processing as described in non-patent document 1, and a macro block is processed by the raster scan order from the upper left. Therefore, there are a lot of parts to be processed using information on the macro block of the upper or the left which is already processed in the sequential processing. The motion estimating unit achieves the high coding efficiency by using information on the macro blocks of the left, the upper and the upper right, and also by using the macro blocks of the left and the upper in the intra predicting unit and the deblocking filter, as described in non-patent document 2.
In recent years, performance improvement of a GPU (Graphics Processing Unit) which is a 3D graphics processing processor as a parallel processing arithmetic unit is remarkable. The GPU is a many-core processor in which numerous cores from tens to hundreds cores are integrated, and in order to draw out the performance, sufficient parallelism for processing application is needed.