1. Field of the Invention
The present invention relates to image processing apparatuses and methods, recording media, and programs. Particularly, the present invention relates to an image processing apparatus and method, a recording medium, and a program that allow accurate extraction of, for example, a region including a moving object.
2. Description of the Related Art
When a user captures an image by a camera, such as a digital video camera or a digital still camera, if the user captures an image with the camera held by hands instead of fixing the camera by a tripod, the image captured (hereinafter simply referred to as the image) could look to be shaky, i.e., the effect of camera shake due to movement (shake) of the camera with respect to an object could occur.
In order to alleviate the effect of camera shake on the image, with regard to a plurality of images continuously captured, using an image as a reference image, parameters representing movement of another image as a whole with respect to the reference image are calculated, and the another image is corrected using the parameters (i.e., the position is adjusted with reference to the reference image). This process is referred to, for example, as camera-shake correction.
The parameters representing movement of another image as a whole with respect to the reference image can be considered as parameters representing positional relationship between the reference image and the another image. The reference image is referred to for the purpose of adjusting (correcting) position. Another image whose position is corrected with respect to the reference image will be referred to as a target image.
Image components that occur between a reference image and a target image due to the effect of camera shake can be generally classified into a component of horizontal movement, which occurs when a camera directed to an object shifts horizontally, and a component of rotation centered about an optical axis of a lens, which occurs when the camera rotates clockwise or counterclockwise. More strictly speaking, a component of rotation about an axis that is perpendicular to the optical axis of the camera lens, or an enlarging or reducing component due to movement in the depth direction of the camera also exist.
Positions in the reference image and target image including the effect of camera shake are adjusted, for example, by affine transformation. Thus, as parameters representing positional relationship between the reference image and the target image, for example, affine parameters for affine transformation can be used.
In affine transformation, a position (x, y) in the reference image and a position (x′, y′) in the target image are expressed by expression (1) below.
                              (                                                                      x                  ′                                                                                                      y                  ′                                                              )                =                                                                    ⁢                          (                                                                    a                                                        b                                                                                        c                                                        d                                                              )                        ⁢                          (                                                                    x                                                                                        y                                                              )                                +                      (                                                            s                                                                              t                                                      )                                              (        1        )            
In expression (1), for example, when a=K×cos θ, b=−K×sin θ, c=K×sin θ, and d=K×cos θ, the left-hand side of expression (1) represents affine transformation for rotation by an angle θ, horizontal movement by (s, t), and enlarging or reducing by K with respect to the position (x, y).
In the following description, the matrix (a, b, c, d) of affine transformation and the two-dimensional vector (s, t) in expression (1) will be collectively denoted as affine parameters (a, b, c, d, s, t).
The affine parameters are calculated, for example, by dividing the reference image into a plurality of blocks and detecting motion vectors of the blocks. That is, for example, affine parameters (a, b, c, d, s, t) in expression (1) are determined so as to minimize the sum of square errors between a position (x″, y″) determined by moving the position (x, y) of each pixel of the reference image onto the target image based on the motion vectors of the blocks of the reference image and a position (x′, y′) determined by transforming the position (x, y) to a position on the target image according to expression (1).
However, when a moving object is included in the reference image and the target image, generally, the movement (motion vector) of the moving object in the image differs from movement of the image as a whole caused by camera shake. Thus, when pixels in a region of the moving object are used to calculate the affine parameters (a, b, c, d, s, t), it is difficult to obtain accurate affine parameters (a, b, c, d, s, t) due to the effect of the movement of the moving object.
Thus, when a moving object is included in an image, a region of the moving object is extracted and removed, and affine parameters (a, b, c, d, s, t) representing movement of the image as a whole caused by camera shake are calculated using only the remaining region other than the region of the moving object.
For example, Japanese Unexamined Patent Application Publication No. 07-038800 proposes techniques for removing a region including a moving object to correct the effect of camera shake.
According to Japanese Unexamined Patent Application Publication No. 07-038800, when detecting movement of an image as a whole, the image is divided into a plurality of blocks, and only motion vectors with respect to the horizontal direction and motion vectors with respect to the vertical direction of the respective blocks are considered.
That is, according to Japanese Unexamined Patent Application Publication No. 07-038800, an entire image is divided into four regions, and each of the four regions is further divided into a plurality of blocks. Then, a motion vector V is calculated for each of the blocks of each of the four regions.
Furthermore, in each of the four regions, the sum Σ(|VXi−VXAVR|+|VYi−YYAVR|) for all the blocks of the sum of the absolute value |VXi−VXAVR| of the difference between a value VXi of the motion vector V of each block with respect to the horizontal direction and an average VXAVR of the motion vectors V of all the blocks with respect to the horizontal direction and the absolute value |VYi−VYAVR| of the difference between a value VYi of the motion vector V of each block with respect to the vertical direction and an average VYAVR of the motion vectors V of all the blocks with respect to the vertical direction is calculated as exitance.
Of the respective exitance values of the four regions, two regions with smaller exitance values are selected, and an average of the motion vectors V of the two selected regions is calculated as a motion vector of the image as a whole, i.e., a parameter representing movement of the image as a whole. That is, of the respective exitance values of the four regions, two regions with larger exitance values are considered as regions including a moving object and excluded from calculation of a parameter representing movement of the image as a whole.