1. Field of the Invention
The present invention relates to an image processing apparatus, an image processing method, and program. More particularly, the present invention relates to an image processing apparatus, an image processing method, and a program that calculate the distance to each object included in an image captured by a camera.
2. Description of the Related Art
Data in which information about the distance to each object included in an image captured by a camera is represented in association with the original image is called a distance map. Different pixel values are set for different distance values in a distance image in which the distance map is represented as an image. For example, nearer objects are represented as pixels of higher luminance values and farther objects are represented as pixels of lower luminance values in the distance image. FIG. 1 shows an example of an image and a distance image. Referring to FIG. 1, a distance image 12 is based on an image 11. The distances to objects are represented by varying the luminance values in the distance image, as in the example in FIG. 1.
The distance map and the distance image can be used in, for example, analysis of the three-dimensional shape of an object. Generation of the distance map and the distance image by image analysis or analysis of a three-dimensional shape is disclosed in, for example, Japanese Unexamined Patent Application Publication No. 2003-216932, Japanese Unexamined Patent Application Publication No. 2004-125708, and Japanese Unexamined Patent Application Publication No. 2004-257934.
The generation of the distance image from an image captured by a camera will now be described. In order to generate the distance image, it is necessary to acquire the distance between the camera and each object. Several technologies to analyze the distance to an object include a technology to find a focus position where the most clear outline or pattern of the object is mapped while varying the focus position of the lens.
This technology is one of the methods widely used in an automatic focus mechanism for compact cameras and is also called a contrast detection technology. A high contrast is achieved when the object is in focus. In measurement of a distance by the contrast detection technology, a variation in the contrast is detected to determine the focus position of each object and the distance corresponding to the focus position is detected as the distance to the object.
Methods of detecting the pixel positions where the luminance values of pixels are sharply varied by using various differential filters are commonly used in the contrast detection technology. The outlines and patterns of objects can be acquired by such methods. The differential filter used to detect the outlines and patterns is hereinafter referred to as an edge detection filter. The luminance values of pixels are directly acquired from the image data in gray-scale images whereas the luminance values of pixels are calculated from linear combination of multiple luminance components in color images.
Recording the positions on the outline or pattern of each object detected in an image while sequentially moving the focus position of the lens from a shortest distance to an infinite distance by using the contrast detection technology allows distance information about the outline or pattern of the object included in the image to be acquired to create the distance map.
However, the outline of each object is not constantly detected in an actual image because of, for example, lack of the difference in luminance between the pixels on the outline of the object. In addition, since nothing is detected with the edge detection filter in an area having no pattern even if the focus position is varied, the resulting distance map includes many undefined regions, that is, regions in which the distances to the object are not defined. Such a region is hereinafter referred to as a distance undefined region.
Estimation of the distances in the distance undefined region to acquire the distance image of the entire image will now be described. In order to calculate the distances in the distance undefined region, an interpolation process using known information around the distance undefined region is generally used. In other words, known distance information observed at the positions on the outline or pattern of an object is used to estimate the distance information about the distance undefined region in the inner area of the object (hereinafter referred to as an object area).
Specifically, as in an example shown in FIG. 2A, an image of an object A 21 is captured by a camera 20. The positions on the outline or pattern of the object detected in the image are recorded while the focus position of the lens is being sequentially moved from a shortest distance to an infinite distance by using the contrast detection technology in the capture of the image. This results in a distance image 22 before interpolation shown in FIG. 2B.
An edge can be detected with the edge detection filter in the outline portion of the object A 21 in the distance image 22 before interpolation shown in FIG. 2B to provide the distance information by the contrast detection. Circles represented by different grayscale values in the outline portion of the object A 21 in the distance image 22 before interpolation indicate the distance information in which the luminance values of pixels are set in accordance with the distance values. Higher luminance values indicate smaller distances and lower luminance values indicate larger distances. The circles represented by different grayscale values in the outline portion of the object A 21 in the distance image 22 before interpolation shown in FIG. 2B are schematically enlarged and the actual distance information can be acquired at the level of the pixels.
In the distance image 22 before interpolation in FIG. 2B, although it is possible to acquire the distance information on the outline portion of the object A 21, it is not possible to acquire the distance information on the inner area of the object A 21 with the contrast detection technology because the inner area has no pattern. In such a case, the interpolation process is performed to estimate the distance values in the inner area by using the distance information acquired in the outline portion. The interpolation process results in a distance image 23 after interpolation shown in FIG. 2C.
In the interpolation process, the object area is divided on the basis of a certain criterion. For example, it is effective to adopt a dividing method in which parts that are close to each other in the image and that have similar colors are grouped into one area. The use of this dividing method enables, for example, the area division in the unit of objects.
For example, the interpolation process can be performed in the following manner for the distance undefined region existing in an area, for example, set as one divided area.
It is assumed that a distance value D(p) is smoothly varied in an image area in which a pixel attribute value Y(p) is smoothly varied and the distance value D(p) is discontinuously varied in an image area in which the pixel attribute value Y(p) is sharply varied in an original image resulting from capturing of an object, where the pixel attribute value Y(p) indicates the attribute value, such as the luminance or color, of a pixel p in the original image resulting from capturing of the object.
The distance value D(p) of the pixel p is estimated on the above assumption, where the pixel p is a pixel for which no distance information is acquired in the distance image before the interpolation.
It is also assumed that a distance value D(q) of a pixel q around the pixel p has been acquired before the interpolation. In other words, the pixel q corresponds to, for example, a pixel in the outline portion of the object A 21 in the distance image 22 before interpolation shown in FIG. 2B.
The distance value D(p) estimated for the pixel p can be approximated by linear addition of the distance value D(q) of the pixel q around the pixel p and a weight wpq, as shown in Expression (A). “N(p)” denotes a collection of pixels around the pixel p.
                                          D            ~                    ⁡                      (            p            )                          ≈                              ∑                          q              ∈                              N                ⁡                                  (                  p                  )                                                              ⁢                                    w              pq                        ⁢                          D              ⁡                              (                q                )                                                                        (        A        )            
In Expression (A), the weight wpq is a function that has a higher value with the decreasing difference between the pixel attribute value Y(p) of the pixel p and a pixel attribute value Y(q) of the pixel q in the original image resulting from capturing of the object. Each of the pixel attribute value Y(p) and the pixel attribute value Y(q) indicates, for example, a luminance value or a color. For example, Expression (B) in which normal distribution is assumed can be used to calculate the weight wpq:
                              w          pq                ∝                  ⅇ                                    -                                                (                                                            Y                      ⁡                                              (                        p                        )                                                              -                                          Y                      ⁡                                              (                        q                        )                                                                              )                                2                                                    2              ⁢                              σ                p                2                                                                        (        B        )            
In Expression (B), σp2 denotes the distribution of a value Y included in the collection N(p).
A one-dimensional vector in which the distance values other than the distance values of pixels for which the distance information is acquired in the distance image before the interpolation are set to zero, that is, a one-dimensional vector in which an estimated distance value D corresponding to each pixel in the distance undefined region where the distance information is not defined is set to zero is denoted by [vector b].
A one-dimensional vector in which the estimated distance value corresponding to each pixel in the distance undefined region where the distance information is not defined is calculated according to Expression (A) to set the distance information for all the pixels is denoted by [vector x]. When the image has a size of n pixels×m pixels, the number of pixels in the entire image is equal to n×m. The [vector x] is a one-dimensional vector in which the estimated distance values D for all the pixels of the number n×m are arranged.
The relationship between the vector x and the vector b can be represented by:Ax=b  (C)where A denotes a square matrix that has a magnitude of (n×m)×(n×m) and that has the weight wpq as an element.
The estimated distance values for the entire image can be acquired by calculating the value of the vector x according to the following expression. An inverse matrix of the square matrix A can be numerically acquired by using, for example, a conjugate gradient method:x=A−1·b  (D)
The distance values of the distance undefined region inside the object area can be estimated by the interpolation process in the above manner to create the distance map and the distance image for the entire image.