In general, a distance estimation method based on a sensing technology is roughly divided into the following two types.
One is an active sensing method. Examples of the active sensing method include a time-of-flight method using a laser or ultrasound, a confocal method commonly used for a microscope or the like, and an illumination stereo method using plural light sources.
The other is a passive sensing method. Examples of the passive sensing method include: a method whereby stereo images are captured and a distance is estimated from a difference between positions of a subject included in the captured images; and a focusing method using a difference in lens focal point based on only one captured image.
The focusing method is a technique used mainly for implementing an autofocus (referred to as “AF” hereafter) function of a camera. Examples of the focusing method include: a depth-from-focus (DFF) method used for, for example, a contrast AF function of a compact digital camera; a pupil-division phase-difference detection method used for an AF function of a single-lens reflex camera; and a depth-from-defocus (DFD) method.
By the DFF method, a contrast at one point included in an image is calculated, and focus is gradually changed. Then, when it is determined that the image is closely in focus, the focus change is stopped and a distance is thus determined.
When this DFF method is used, the focus change needs to be performed sequentially until the image comes into focus and, also, this method needs to be performed on all pixels in the image. For this reason, it takes significant time for distance measurement. In other words, this method is not suitable for distance measurement when the subject is moving.
A distance to a specific subject can be measured according to the method using the AF function as described above. However, when distance estimation is performed on all the pixels included in the image, a great number of images having different focal positions need to be necessarily captured. More specifically, there is a disadvantage that an image in focus needs to be selected from among the captured images for each of the pixels to estimate the distance.
With the pupil-division phase-difference detection method, a distance can be estimated directly by measuring a spacing between image formations performed by phase-difference detection sensors, and it does not take time to estimate the distance. However, it is impossible to arrange the phase-difference detection sensors corresponding to all the pixels included in the image. On this account, the distance estimation can be performed only for a predetermined point in the image. Moreover, the size of mechanism to implement this function is inevitably larger as compared with the cases of using the other methods.
With the DFD method, two images having different focal positions are captured, and a focal distance is calculated directly from the images and a blur parameter of a lens (see Non Patent Literatures 1 and 2, for example).
Here, the blur parameter represents a value that indicates blurring included in luminance information and has a correlation with variance of a point spread function (PSF) of the lens.
The PSF represents spreading of light of when an ideal point image passes through an optical system, and is also referred to as a defocus characteristic.
As is the case with the aforementioned pupil-division to phase-difference detection method, the DFD method does not take time to estimate the distance and only minimal two images have to be captured. However, the blur parameter of the lens needs to be obtained in advance. It should be noted that a real blurred image includes blurring caused not only by the lens, but also by an aperture is of an image pickup device and by film characteristics. On account of this, the blurring caused in these ways need to be obtained and considered in advance.
The DFD method has a problem that the two images having different focal positions need to be captured with no difference in magnification. However, it often happens that a normal optical system is not designed in this way. To be more specific, it is necessary to adopt an image-side telecentric optical system (see Non Patent Literature 3, for example).
Moreover, there is another problem that the accuracy in distance estimation can be maintained only in the case of a relatively small amount of blurring.
A first factor for this problem is as follows. Since the distance is estimated by performing matching in image processing, it may be hard for the distance estimation accuracy to be relatively high when the power is small, that is, when the amount of blurring is large.
A second factor for this problem is as follows. In the case of a blurring model using a real lens (a coupling lens), an amount of change in blurring is likely to be smaller when the amount of blurring is larger. Note that, however, this is not the case for an ideal lens model such as a Gaussian model or a pillbox model. That is, the distance estimation accuracy decreases in a region where the amount of blurring is large and the amount of change in blurring is small.
In regard to this problem, there is an idea that the distance estimation accuracy is low in the case of a normal optical system because a pupil (an aperture) that determines characteristics of blurring is round in shape and thus the amount of change in blurring is small. A method based on this idea has been proposed. This method employs a structured pupil mask to perform distance estimation using a model having a large amount of change in blurring (see Non Patent Literature 4, for example).
The structured pupil mask allows the changes in blurring to be more recognizable in the direction of depth than in the case of the round aperture, thereby increasing the distance estimation accuracy.
Moreover, in addition to that the distance can be estimated, an all-in-focus image can also be generated at the same time.
As a method obtained by further developing this idea, a coded aperture method has also been proposed that performs distance measurement with a higher degree of accuracy for each subregion of an image by further devising the pupil shape (see Non Patent Literature 5, for example).
These methods devising the pupil shape have a problem that, although the distance estimation accuracy increases, the amount of light decreases and the image quality of the all-in-focus image slightly decreases as well.
It should be noted that each of the aforementioned approaches to the problem is based on an idea of increasing the distance estimation accuracy by generating a zero point in frequency transfer characteristics in the entire optical system through devising the pupil shape. More specifically, based on this idea, although robust distance estimation can be achieved regardless of a subject, there is information that has been lost due to the zero point (i.e., a component which is zero in a frequency domain) when the all-in-focus image is to be restored. On this account, this idea has a fundamental problem that the lost information cannot be restored in a subsequent stage of signal processing. The above-mentioned problem occurs due to this.
Approaches to solving the problem include the following method. By using two images captured with different aperture shapes as one pair, this method achieves an increase in the distance estimation accuracy and also prevents a decrease in performance to restore the all-in-focus image. Moreover, a technique using coded aperture pairs according to this method has also been proposed (see Non Patent Literature 6, for example). With this technique, two images of a subject captured with different aperture shapes are expected to advantageously complement each other's zero points.
However, it is difficult to set a pair of aperture shapes with which the captured images always complement each other's zero points at any distance in any optical system.
Moreover, even on a precondition that a specific optical system is to be used, it is still difficult to set a pair of aperture shapes with which the captured images always complement each other's zero points at any distance.
To be more specific, it is harder to avoid a decrease in the image quality of the restored all-in-focus image than in the case of using the normal optical system.
Furthermore, another approach has been disclosed. That is, by firstly obtaining an all-in-focus image and combining the obtained all-in-focus image and an image normally captured subsequently, a distance to a subject is estimated based on a difference in focus between the images (see Non Patent Literature 7 and Patent Literature 1, for example).
A method called “Focal Stack” is one of the well-known conventional techniques. According to this method, plural images having different focal positions are captured, and a region considered to be in focus is extracted from each of the captured images to form, by image synthesis, an extended depth of field (focus) (EDOF) image, i.e., an all-in-focus image.
Distance estimation is performed using the all-in-focus image is obtained as described and one actual image focused at a given distance, such as at a shortest (closest) distance.
A blur parameter in the case where the image is focused at the closest distance is obtained in advance for each subject distance by, for example, measurement. A comparison is made for each region between: each of images obtained from the all-in-focus image by simulating blurring for each subject distance using the aforementioned blur parameter; and the above-mentioned actual image focused at the closest distance. Then, a distance indicated by the most similar image is determined to be the distance of the subject.
A configuration that is necessary to implement this method is described, with reference to FIG. 8.
FIG. 8 is a block diagram showing a configuration of a distance estimation device 9 which estimate a distance using an all-in-focus image and an actual image focused at a specific distance.
The distance estimation device 9 includes an all-in-focus image generation unit 91, a specific-focal-depth image obtaining unit 9101, a blur-parameter-set obtaining unit 9102, a blurred-image-set generation unit 9103, a similar-blurring determination unit 9104, and a distance map generation unit 9105.
The all-in-focus image generation unit 91 generates an all-in-focus image (i.e., an image 91a in FIG. 8).
It should be noted that, as a specific configuration of the all-in-focus image generation unit 91, a configuration used in a method that obtains an all-in-focus image according to an extended depth of field (referred to as “EDOF” hereafter) technology is known. This configuration may be used for the all-in-focus image generation unit 91, for example.
In general, there are mainly five methods as follows.
A first method is called “Focal Stack”. According to this method, images having different focal positions are captured, and a focused region is extracted from each of the captured images to form, by image synthesis, an EDOF (extended depth of field) image, i.e., an all-in-focus image.
A second method uniformizes blurring in a direction of depth by inserting an optical element called a phase plate, and performs image restoration processing using a blurred pattern obtained in advance by measurement or simulation. As a result, an EDOF image, i.e., an all-in-focus image is obtained. This method is called “Wavefront Coding” (see Non Patent Literature 8, for example).
A third method convolutes images focused uniformly in a direction of depth (meaning that blurring is uniformized in the direction of depth) by moving a focus lens or an image pickup element during exposure, and performs image restoration processing using a blurred pattern obtained in advance by measurement or simulation. As a result, an EDOF image, i.e., an all-in-focus image is obtained. This method is called “Flexible DOF” (see Non Patent Literature 9, for example).
A fourth method is an approach close to the Focal Stack method. Instead of capturing plural images, this method performs depth estimation or image sharpness detection on one color image using an axial chromatic aberration of the lens. Then, by image processing, an entirely sharp image is obtained as an all-in-focus image (see Non Patent Literature 10, for example).
A fifth method uniformizes blurring in a direction of depth using a multi-focal lens, and performs image restoration processing using a blurred pattern obtained in advance by measurement or simulation. As a result, an all-in-focus image is obtained (see Non Patent Literature 11, for example).
Any of the above five methods can implement the all-in-focus image generation unit 91.
The specific-focal-depth image obtaining unit 9101 selects an arbitrary one image from among a set of images used by the all-in-focus image generation unit 91 in generating the all-in-focus image, or separately captures a new image. By doing so, the specific-focal-depth image obtaining unit 9101 obtains an image (i.e., an image 9101a) focused at a specific depth, namely, a specific distance.
In this way, the specific-focal-depth image obtaining unit 9101 causes a camera to focus at the set specific depth, and thus obtains the image focused at this focal depth.
The blur-parameter-set obtaining unit 9102 reads out recorded blur parameters. To be more specific, a blur parameter (i.e., data 9102a) indicating blurring is numerically recorded in advance for each arbitrary depth (distance) of when the specific-focal-depth image obtaining unit 9101 causes the camera to focus at the set specific depth. This recording process is performed by, for example, the blur-parameter-set obtaining unit 9102. Then, the blur-parameter-set obtaining unit 9102 reads out the blur parameters recorded in this way. Alternatively, when a manner of blurring caused by the lens can be formulated, the blur parameter (the data 9102a) is calculated according to this formula.
The blurred-image-set generation unit 9103 receives the all-in-focus image from the all-in-focus image generation unit 91. The blurred-image-set generation unit 9103 also receives the blur parameter for each arbitrary depth from the blur-parameter-set obtaining unit 9102. Then, the blurred-image-set generation unit 9103 convolutes the blur parameter in the all-in-focus image for each arbitrary depth. This obtained set of images corresponding to the arbitrary depths represents a set of simulated images of when it is hypothesized that all subjects are present in these depths.
The similar-blurring determination unit 9104 makes a comparison between: each of the images corresponding to the arbitrary depths obtained by the blurred-image-set generation unit 9103 (i.e., an image 9103a for each of the depths (distances)); and the actual captured image focused at the specific depth (i.e., the image 9101a) obtained by the specific-focal-depth image obtaining unit 9101. The similar-blurring determination unit 9104 makes this comparison, region by region included in these two images, and determines a degree of similarity in blurring. As a result, it is determined that, region by region, similarity is present between the image (i.e., the image 9103a) at which depth (distance) and the actual captured image (i.e., the image 9101a). To be more specific, an evaluation function is calculated according to Equation 1 as follows.
                              [                      Math            .                                                  ⁢            1                    ]                ⁢                                                                                                Fc          ⁡                      (            d            )                          =                              ∑            c                    ⁢                      (                          A              -                              F                ⁢                                                                  ⁢                1                *                                  K                  ⁡                                      (                    d                    )                                                                        )                                              Equation        ⁢                                  ⁢        1            
Here, “d” represents a depth (distance), “A” represents a luminance value of the all-in-focus image, “F1” represents a luminance value of the actual captured image focused at the specific depth, “K(d)” represents a blur parameter corresponding to an arbitrary depth, and “Fc(d)” represents an evaluation function. Moreover, “*” in Equation 1 represents a convolution operator. Of these variables, each of “A” and “F1” is a three-dimensional matrix of vertical elements by horizontal elements by color elements of the image. Moreover, “Fc(d)” is a two-dimensional matrix of vertical elements by horizontal elements of the image. Furthermore, the blur parameter K(d) is a three-dimensional matrix having: a square adequate in size to describe changes in blurring; and color. The similar-blurring determination unit 9104 calculates, region by region included in the image, “d” such that the evaluation function is the smallest with respect to the evaluation function “Fc(d)” calculated for each pixel in this way.
The distance map generation unit 9105 maps, on the image, this “d” obtained region by region included in the image as described above, and outputs the resulting map as a distance map.