The present disclosure relates to an image processing device, an image processing method, an image pickup device, and an image pickup method that can correct a so-called hand movement component included in image information obtained by image pickup in an image pickup device such as a digital still camera or a video camera, for example, and thereby obtain an image free from the hand movement component.
Generally, when photographing is performed with an image pickup device such as a digital still camera, a video camera or the like held by hand, vibration of the image pickup device due to hand movement at the time of the photographing appears as a vibration in a picture unit of a picked-up image.
As methods for correcting the vibration of the picked-up image due to such hand movement, optical hand movement correction systems using a gyro (angular velocity) sensor have been dominant in the recent market with a reduction in cost, an improvement in performance, and a reduction in size of the gyro sensor.
Recently, a new problem has arose as a result of rapid spreading of digital still cameras and concurrent sharp increases in the number of pixels. The problem is that although hand movement correction is strongly desired also for still images at times of low illuminance (long exposure time), there is only a solution using a sensor such as a gyro sensor or the like, so that weak points and other problems of the gyro sensor, such for example as low detection precision of the gyro sensor itself, are being revealed.
Hand movement corrections used for still images in devices for consumer use currently available on the market all measure a hand movement vector using a gyro sensor or an acceleration sensor, and feed back the hand movement vector to a mechanism system to perform high-speed control so as to prevent blurring of an image projected on an image sensor such as a CCD (Charge Coupled Device) imager, a CMOS (Complementary Metal Oxide Semiconductor) imager or the like.
Proposed as the mechanism system referred to here is a lens, a prism, or an imager (or a module integral with the imager), and control of the lens, the prism, or the imager is referred to as a lens shift, a prism shift, or an imager shift, respectively.
As long as hand movement correction is made by such a method, it is simply not possible to make correction with a pixel precision because of an accumulation of not only precision errors of the gyro sensor itself as mentioned above but also a delay in feedback to the mechanism system or errors in prediction for avoiding a feedback delay and control errors of the mechanism system.
Even though as mentioned above, the hand movement corrections using sensors in the present situation have a serious problem of being unable to increase precision in principle, the hand movement corrections are valued highly in the market because the hand movement corrections are short of correcting hand movement but are able to reduce hand movement.
However, as the number of pixels is expected to be increasingly larger in the future and as pixel size is reduced, it is a matter of time before the market realizes that a gap between a correction limit and pixel precision will inevitably be widened increasingly.
On the other hand, as another method for correcting the vibration of a picked-up image due to hand movement, a sensorless hand movement correction method is known which calculates a motion vector of a picture unit of the picked-up image, and shifts reading positions of picked-up image data stored in an image memory on the basis of the motion vector, thereby making hand movement correction.
As a method for detecting a motion vector of a picture unit of the picked-up image from picked-up image information itself, block matching is known which determines correlation between picked-up images of two pictures. A sensorless hand movement correction method using this block matching also has advantages of being able to detect a hand movement vector with a pixel precision including a rotation component in a roll-axis direction in principle and making it possible to reduce size and weight of an image pickup device because a need for mechanical parts such as a gyro sensor and the like is eliminated.
FIG. 71 and FIG. 72 show a schematic representation of an outline of block matching. FIG. 73 is a typical example of a flowchart of a block matching process.
Block matching is a method that calculates a motion vector in a unit of one picture between a reference picture of picked-up images from an image pickup device unit as a picture of interest and an original picture as a picked-up image picture preceding the reference picture by one picture, for example, by calculating correlation between the reference picture and the original picture in blocks as rectangular regions of predetermined size.
Incidentally, while a picture in this case refers to an image formed by image data of one frame or one field, suppose in the present specification that for convenience of description, a picture is formed by one frame, and a picture is referred to as a frame. Thus, a reference picture is referred to as a reference frame, and an original picture will be referred to as an original frame (target frame).
For example, the image data of the reference frame is the image data of a present frame from the image pickup device unit, or image data obtained by storing the image data of a present frame in a frame memory and delaying the image data by one frame. The image data of the original frame is image data obtained by further storing the image data of the reference frame in a frame memory and delaying the image data by one frame.
In block matching, as shown in FIG. 71, a target block 103 formed by a rectangular region of a predetermined size including a plurality of pixels in a horizontal direction and a plurality of lines in a vertical direction is set at an arbitrary predetermined position in the original frame 101.
On the other hand, in the reference frame 102, a projected image block 104 (see a dotted line in FIG. 71) of the target block is assumed at the same position as the position of the target block 103 in the original frame, a search range 105 (see alternate long and short dash lines in FIG. 71) is set with the projected image block 104 of the target block as a center, and a reference block 106 having the same size as the target block 103 is considered.
Then, the reference block 106 is moved to positions within the search range 105 in the reference frame 102. Correlation between image contents included in the reference block 106 at each of the positions and image contents of the target block 103 is determined. The position of the reference block 106 at which position the correlation is strongest is detected as a position to which the target block 103 in the original frame is moved in the reference frame 102. Then, an amount of positional displacement between the detected position of the reference block 106 and the position of the target block is detected as a motion vector as a quantity including a direction component.
In this case, the reference block 106 is moved in the search range 105 by a unit of one pixel or a plurality of pixels in the horizontal direction and the vertical direction, for example. Hence, a plurality of reference blocks are set within the search range 105.
The correlation between the target block 103 and the reference block 106 moved within the search range 105 is detected by obtaining a sum total of absolute values of differences between the luminance values of all pixels within the target block 103 and the luminance values of corresponding pixels within the reference block 106 (the sum total of the absolute values of the differences will be referred to as a difference absolute value sum, and the difference absolute value sum will hereinafter be described as a SAD (Sum of Absolute Difference) value). That is, the reference block 106 at a position of a minimum SAD value is detected as a reference block having the strongest correlation, and an amount of positional displacement of the detected reference block 106 with respect to the position of the target block 103 is detected as a motion vector.
In block matching, an amount of positional displacement of each of a plurality of reference blocks 106 set within the search range 105 with respect to the position of the target block 103 is represented by a reference vector 107 (see FIG. 71) as a quantity including a direction component. The reference vector 107 of each reference block 106 has a value corresponding to the position of the reference block 106 in the reference frame 102. In the existing block matching, the reference vector of the reference block 106 from which a minimum SAD value is obtained is detected as a motion vector corresponding to the target block 103.
Generally, in the block matching, as shown in FIG. 72, SAD values between a plurality of respective reference blocks 106 set within the search range 105 and the target block 103 (the SAD values will hereinafter be referred to as SAD values of the reference blocks for simplicity of description) are stored in a memory in correspondence with respective reference vectors 107 corresponding to the positions of the respective reference blocks 106 within the search range 105. A reference block 106 having a minimum SAD value among the SAD values of all the reference blocks 106 which SAD values are stored in the memory is detected. Thereby the motion vector 110 corresponding to the target block 103 is detected.
A table in which the SAD values of the respective reference blocks 106 are stored in correspondence with the respective reference vectors 107 corresponding to the positions of the plurality of reference blocks 106 set within the search range 105 is referred to as a difference absolute value sum table (hereinafter referred to as a SAD table). A SAD table 108 in FIG. 72 illustrates this table. The SAD values of the respective reference blocks 106 in the SAD table 108 are referred to as SAD table elements 109.
Incidentally, in the above description, the positions of the target block 103 and the reference blocks 106 refer to arbitrary specific positions, for example central positions of the blocks. A reference vector 107 indicates an amount of displacement (including a direction) between the position of the projected image block 104 of the target block 103 and the position of the reference block 106 in the reference frame 102. In the example of FIG. 71 and FIG. 72, the target block 103 is situated at the central position of the frame.
The reference vectors 107 corresponding to the respective reference blocks 106 represent displacements of the positions of the respective reference blocks 106 with respect to the position corresponding to the target block 103 in the reference frame 102. Therefore, when the position of a reference block 106 is specified, the value of the reference vector corresponding to the position is also specified. Hence, when the address of the SAD table element of a reference block in the memory of the SAD table 108 is specified, the corresponding reference vector is specified.
The process of the existing block matching described above is described below with reference to a flowchart of FIG. 73 as follows.
First, one reference block Ii within the search range 105 is specified. This is equivalent to specifying the reference vector corresponding to the reference block Ii (step S1). In FIG. 73, (vx, vy) denotes a position indicated by the specified reference vector when the position of the target block in the frame is set as a reference position (0, 0). vx is a component of an amount of displacement by the specified reference vector from the reference position in the horizontal direction. vy is a component of an amount of displacement by the specified reference vector from the reference position in the vertical direction.
In this case, the amounts of displacement vx and vy are values in units of pixels. For example, vx=+1 indicates a position shifted by one pixel in the right direction of the horizontal direction with respect to the reference position (0, 0). vx=−1 indicates a position shifted by one pixel in the left direction of the horizontal direction with respect to the reference position (0, 0). For example, vy=+1 indicates a position shifted by one pixel in the downward direction of the vertical direction with respect to the reference position (0, 0). vy=−1 indicates a position shifted by one pixel in the upward direction of the vertical direction with respect to the reference position (0, 0).
As described above, (vx, vy) denotes the position indicated by a reference vector with respect to the reference position (hereinafter referred to as the position indicated by the reference vector for simplicity), and corresponds to each reference vector. That is, supposing that vx and vy are integers, (vx, vy) represents each reference vector. Hence, in the following description, a reference vector indicating the position (vx, vy) may be described as a reference vector (vx, vy).
With the central position of the search range set as the position of the target block, that is, the reference position (0, 0), when the search range is defined by ±Rx in the horizontal direction, and the search range is defined by ±Ry in the vertical direction, the search range is expressed as−Rx□vx□+Rx,−Ry□vy□+Ry
Next, coordinates (x, y) of one pixel within the target block Io are specified (step S2). Next, the absolute value α of a difference between a pixel value Io(x, y) at the specified coordinates (x, y) within the target block Io and a pixel value Ii(x+vx, y+vy) at a corresponding pixel position within the reference block Ii is calculated (step S3). That is, the difference absolute value α is calculated asα=|Io(x,y)−Ii(x+vx,y+vy)|  (Equation 1)
Then, the calculated difference absolute value α is added to a previous SAD value at an address (table element) indicated by the reference vector (vx, vy) of the reference block Ii, and a SAD value as a result of the addition is written back to the address (step S4). That is, when the SAD value corresponding to the reference vector (vx, vy) is expressed as SAD(vx, vy), the SAD value is calculated asSAD(vx,vy)=Σα=Σ|Io(x,y)−Ii(x+vx,y+vy)|  (Equation 2)
The SAD value is then written to the address indicated by the reference vector (vx, vy).
Next, whether the above-described operation has been performed for pixels at all coordinates (x, y) within the target block Io is determined (step S5). When it is determined that the operation has not yet been completed for the pixels at all the coordinates (x, y) within the target block Io, the process returns to step S2 to specify a pixel position at next coordinates (x, y) within the target block Io and repeat the process from step S2 on down.
When it is determined in step S5 that the above-described operation has =been performed for the pixels at all the coordinates (x, y) within the target block Io, it is determined that the calculation of the SAD value for the reference block in question is completed. Then, whether the above-described operation process has been completed for all reference blocks, that is, all reference vectors (vx, vy) within the search range is determined (step S6).
When it is determined in step S6 that there is a reference vector (vx, vy) for which the above-described operation process has not yet been completed, the process returns to step S1 to set the next reference vector (vx, vy) for which the above-described operation process has not been completed, and the process repeats from step S1 on down.
Then, when it is determined in step S6 that there is no reference vector (vx, vy) for which the above-described operation process has not been completed within the search range, it is determined that a SAD table is completed. A minimum SAD value is detected in the completed SAD table (step S7). Then, a reference vector corresponding to an address of the minimum SAD value is detected as a motion vector corresponding to the target block Io (step S8). When the minimum SAD value is written as SAD (mx, my), the intended motion vector is calculated as a vector (mx, my) indicating a position (mx, my).
Thus the process of detecting the motion vector corresponding to one target block by block matching is ended.
In practice, it is difficult to obtain a high-precision hand movement vector of the reference frame with respect to the original frame from the motion vector corresponding to one target block. Therefore, in the original frame, a plurality of target blocks are set so as to cover the entire range of the original frame. On the other hand, in the reference frame, as shown in FIG. 74, search ranges 105, 105, . . . are set for projected images 104, 104, . . . of the plurality of target blocks, respectively, and motion vectors 110, 110, . . . corresponding to the target blocks are detected in the respective search ranges.
Then, the hand movement vector (global motion vector) of the reference frame with respect to the original frame is detected from the plurality of detected motion vectors 110, 110, . . . .
As a main method for detecting the hand movement vector (global motion vector) from the plurality of motion vectors 110, a method has been proposed which makes a majority decision based on the plurality of motion vectors, that is, which sets a maximum number of motion vectors that are equal to each other in direction and magnitude among the plurality of motion vectors 110 as global motion vector. In addition, a method combining the method of majority decision with reliability evaluation based on an amount of change (frequency) of the motion vector in a direction of a time axis has been proposed.
Most of sensorless hand movement corrections as existing art, as typified by Patent Document 1 (Japanese Patent Laid-Open No. 2003-78807), are targeted for moving images. As methods for implementing sensorless hand movement correction for still images, a few methods have been proposed, including Patent Document 2 (Japanese Patent Laid-Open No. Hei 7-283999). This Patent Document 2 is an algorithm of consecutively taking still images in short exposure times such that a hand movement component is not produced, obtaining hand movement vectors between the still images, adding together the plurality of still images taken consecutively while moving the still images according to the hand movement vectors, and finally obtaining a high picture quality (high resolution) still image free from hand movement components and low illuminance noise.
Patent Document 3 (Japanese Patent Laid-Open No. 2005-38396) can be recited as a practical proposal on a feasible level. A device disclosed in Patent Document 3 includes means for obtaining a motion vector in a size resulting from reducing conversion of an image and means for sharing an identical SAD table between a plurality of blocks. The reducing conversion of an image and the sharing of a SAD table between a plurality of blocks are a very good method for realizing reduction of SAD table size, and are used in other fields for motion vector detection and scene change detection in an MPEG (Moving Picture Experts Group) image compression system, for example.
However, the algorithm of Patent Document 3 has problems in that the reducing conversion of an image and memory (DRAM (Dynamic RAM (Random Access Memory))) access at the time of the reducing conversion consume time and memory space, and because the plurality of blocks make time-division access to the SAD table, memory access is greatly increased and this process also takes time. Real-time performance and reduction of a system delay time are both required in hand movement correction for moving images, and therefore the process time becomes a problem.
Further, the reducing conversion of an original image requires that a low-pass filter for removing aliasing (folding distortion) and low illuminance noise be implemented as preprocessing for the reduction process. However, characteristics of the low-pass filter are changed according to a reduction scaling factor, and especially when a low-pass filter in a vertical direction is a multiple-tap digital filter, many line memories and operation logics are required, thus presenting a problem of an increase in circuit scale.
In a hand movement correction system for moving images, rough real-time detection of a hand movement vector with importance attached to processing time rather than precision is desired, and even sensorless hand movement correction methods according to the existing art provide satisfactory results in most situations.
On the other hand, existing technology in hand movement correction systems for still images is often proposed on an idea level, and often does not assume that the number of pixels is on a level of 10 millions today. Therefore, consideration is not given to a rotation component of hand movement, or even if consideration is given to a rotation component of hand movement, a massive amount of calculation is required, for example. Thus there is a lack of practical consideration targeted for current mobile devices such as digital still cameras and the like.
As described above, however, it is expected that image pickup devices such as digital cameras and the like will become increasingly higher in pixel density and higher performance will be required thereof in the future. In such a situation, realization of sensorless hand movement correction at a time of photographing a still image without using a gyro (angular velocity) sensor is of great significance.
Accordingly, as described above, it is promising to calculate a hand movement motion vector on a sensorless basis using block matching and make hand movement correction using the detected motion vector. In addition, it is important to solve the above-described problems.
In view of the above, it is desirable to provide a method and a device for image processing that can solve the problems of the existing sensorless hand movement correction system described above, and provide images of high picture quality.