1. Field of the Invention
The present invention relates to an image processing device and an image processing method for detecting a motion (motion vector) between two different screens. In the present specification, the term “screen” refers to an image made up of a field or frame of image data which appears on the display as a single image.
2. Description of the Related Art
A variety of image processing operations can be performed by detecting a motion (motion vector) between two different screens and using the detected motion vector.
For example, it is widely known that moving image data can be compressed with high efficiency during handling of such data if a motion vector between two screens in the data is used. It is also known that a detected motion vector is used to detect panning and tilting of the television camera and track the subject.
Further, recent years have witnessed rapidly growing use of a motion vector in diverse applications, sensorless handshake correction and removing noise during filming under low illumination by superposing images using a detected motion vector.
The block matching method and gradient method are examples of known motion vector calculation methods. The block matching method is commonly used.
The block matching method calculates a motion vector between two screens, a reference screen of interest and a template screen which is the source of motion in the reference screen, by calculating the degree of correlation between the reference and template screens on a block-by-block basis, with one block being a rectangular area of a predetermined size.
That is, the block matching method finds the degree of correlation between the template block set in the template screen and a plurality of reference blocks set in the search area of the reference screen for each of the reference blocks. The degree of correlation represents how well the two blocks are correlated with each other. The motion vector is calculated relative to the template block based on the reference block having the highest of all the degrees of correlation found.
It should be noted that there are two cases, one in which the template screen is antecedent to the reference screen (e.g., motion detection in MPEG), and another in which the reference screen is antecedent to the template screen (e.g., noise reduction by superposing image frames).
As described earlier, the term “screen” refers to an image made up of a field or frame of image in the present specification. For convenience of the following description in the present specification, we assume that a screen is made up of a frame. As a result, a screen is referred to as a frame. Therefore, a reference screen is referred to as a reference frame, and a template screen as a template frame.
An existing common block matching method will be outlined below with reference to FIGS. 27A to 30.
The block matching method described here divides, for example, a template frame 100 into a plurality of rectangular areas (referred to as blocks), each of which is of a predetermined size and made up of a plurality of horizontal pixels and a plurality of vertical lines, as illustrated in FIG. 27A. Each of the plurality of blocks 102 in the template frame 100 is referred to as a template block.
The block matching searches for a block highly correlated to the template block 102 from among those in a reference frame 101. The blocks, set in the reference frame 101, which are of equal size to the template block 102 are referred to as reference blocks.
A reference block 103 (refer to FIG. 27B) detected to have the highest degree of correlation in the reference frame 101 as a result of the search is referred to as a motion compensation block. On the other hand, the displacement (including magnitude and direction of displacement) between the template block 102 and motion compensation block 103 is referred to as a motion vector (refer to reference numeral 104 in FIG. 27B).
When a projection image block 109 of each of the template blocks 102 is assumed to be at the same position in the reference frame 101 as that template block 102 of the template frame 100, the motion vector 104 corresponds to the displacement between the position (e.g., center) of the projection image block 109 of this template block and the position (e.g., center) of the motion compensation block 103. The motion vector 104 also contains magnitude and direction components of displacement.
The block matching process will be outlined below. As illustrated by a dashed line in FIG. 28, the projection image block 109 of each of the template blocks 102 is assumed to be at the same position in the reference frame 101 as that template block 102 of the template frame 100. The coordinates of the center of the projection image block 109 of this template block are set as a motion detection origin 105. Assuming that the motion vector 104 exists within a given area from the motion detection origin 105, a predetermined area centered at the motion detection origin 105 is set as a search area 106 (refer to a long-dashed short dashed line in FIG. 27B).
Next, a block (reference block) 108 of equal size to the template block 102 is set in the reference frame. Then, the reference block 108 is moved, for example, horizontally or vertically, by one or a plurality of pixels at a time in the search area 106. Therefore, the plurality of reference blocks 108 are set in the search area 106.
Here, moving the reference block 108 in the search area 106 means moving the center of the reference block 108 in the search area 106 because the motion detection origin 105 is the center of the reference block. Therefore, the pixels making up the reference block 108 may be pushed off the search area 106.
Next, a vector (referred to as a reference vector) 107 (refer to FIG. 27B) is set in the search area for each of the reference blocks 108. The reference vector 107 represents the magnitude and direction of displacement between the reference block 108 of interest and the template block 102. Then, the correlation is evaluated between the image content of the reference block 108 at the position indicated by one of the reference vectors 107 and that of the template block 102.
The reference vector 107 can be expressed as a vector (Vx, Vy) as illustrated in FIG. 29 when the horizontal displacement (in the X direction) of the reference block 108 is Vx and the vertical displacement (in the Y direction) thereof. Vy.
If, for example, the reference block 108 is displaced by one pixel in the X direction from the template block 102, the reference vector 107 is a vector (1,0). On the other hand, if the reference block 108 is displaced by three pixels in the X direction and two pixels in the Y direction from the template block 102 as illustrated in FIG. 30, the reference vector 107 is a vector (3,2).
That is, the reference vector 107 represents the displacement between the center of the associated reference block 108 and that of the template block 102 when the positions of the reference block 108 and template block 102 are assumed to be the centers of the respective blocks as illustrated in the example of FIG. 30.
The reference block 108 moves in the search area 106. In this case, the center of the reference block 108 moves in the search area 106. As described earlier, the reference block 108 is made up of a plurality of horizontal and vertical pixels. Therefore, the largest area in which the reference block 108 to be matched against the template block 102 moves is a matching area 110 which is larger than the search area 106 as illustrated in FIG. 30.
The position of the reference block 108 detected to have the highest degree of correlation in image content to the template block 102 is detected as the position of the template block 102 (after the motion) in the reference frame 101. The detected reference block is determined to be the motion compensation block 103. Then, the displacement between the detected position of the motion compensation block 103 and that of the template block 102 is detected as the motion vector 104 which is a magnitude containing a direction component (refer to FIG. 27B).
Here, the correlation level representing how well the template block 102 and the plurality of reference blocks 108 set in the search area 106 are correlated with each other are basically calculated using the associated pixel values of the template block 102 and reference blocks 108. A variety of methods have been proposed to obtain the correlation level, including one using a mean square and another adapted to calculate the sum of the differences of the pixels in the block.
As described above, the block matching method calculates the degree of correlation between the reference and template blocks as an evaluation level indicating whether the blocks are analogous to each other. Next, the reference block with a high calculated degree of correlation (high evaluation level) is detected in the search area. Then, the magnitude and direction of displacement between the reference block with a high evaluation level and the template block is detected as the motion vector relative to the template block.
The above block matching method calculates the degree of correlation between the plurality of reference blocks set in the search area and the template block for each of the reference blocks, thus resulting in a large number of image processing operations (image processing cycles) and leading to high power consumption.
In order to solve this problem, a method has been proposed. This method organizes the template and reference frames into layers. Each of the layers includes a high resolution image or a low resolution image obtained by reducing the high resolution image. A motion vector is calculated for each layer (refer to Japanese Patent Laid-Open No. 2009-55410, which is hereinafter referred to as Patent Document 1).
This layered motion vector calculation method finds a motion vector relative to the template block for the low resolution image to determine the search area of the reference block relative to the template block in the high resolution image based on the found motion vector. This makes it possible to reduce the search area of the reference block in the reference frame for the high resolution image.
From the above, the layered method provides a reduced number of pixels for calculation of a motion vector in the low resolution image, thus ensuring reduced arithmetic operations for calculation of the degree of correlation between the template and reference blocks. Thanks to the motion vector obtained based on the low resolution image, the search area of the reference block relative to the template block can be reduced in the high resolution image. This makes it possible to calculate a motion vector relative to the template block for the target resolution image with a small number of image processing operations (image processing cycles).
However, even the layered motion vector calculation method cannot find an accurate motion vector if the image is reduced at a high reduction ratio. Therefore, the layered motion vector calculation alone has its limitations in reducing the number of image processing operations and power consumption.
In contrast, Japanese Patent Laid-Open No. 2004-56305, which is hereinafter referred to as Patent Document 2, for example, discloses a technique devised to organize low and high frequency images into layers by separating images into frequency bands. This technique skips the motion vector calculation for a high frequency image when the sum of high frequency components is below the threshold.