Cameras are everywhere. There are billions of cell-phone cameras, over five hundred million surveillance cameras, cameras in cars, and at homes. In most cases, those cameras are passive devices that only record videos. This leaves most of the acquired videos unprocessed. A major bottleneck towards automating a visual scene interpretation is a lack of 3D information that is crucial for scene understanding. It is desired to adapt a conventional video camera to be able to provide meaningful 3D information during the acquisition of a video.
Variable Focus Makes Cameras 3D
Most modern cameras are equipped with features such as auto-focus, variable focal length and zoom, all of which require the focal distance to change. Unfortunately, this ability of the camera is significantly under-utilized. Typically, auto-focus is only used to obtain an image in which the subject of interest is in-focus.
Depth from Defocus (DFD)
Depth from defocus (DFD) analysis for depth estimation has significant advantages over stereo and structure analysis from motion, because DFD circumvents the correspondence problem required for stereo analysis. Another advantage of DFD over stereo is that only a single camera is required in DFD.
Several methods for solving DFD are known. Typically, those methods minimize a cost function including a data term and a spatial regularization term. The data term constrains how the texture blurs as a function of known focal distances corresponding to depths. The regularization term models spatial smoothness constraints within the depth map of the scene. However, all existing methods assume that the camera and scene are static. None of those methods can use DFD for dynamic scenes. As used defined, dynamic scene have either scene motion and/or camera motion.
Variable Depth of Field Imaging
The depth-of-field (DOF) of an imaging system can be extended by reducing the aperture. However, this reduces the amount of light received by the camera sensor, leading to a low signal to noise ratio (SNR). If the aperture is increased, then the sensor noise is reduced but at the cost of a decrease in the DOF.
Ideally, a large DOF is desired but with reduced sensor noise. Several methods are known that overcome this fundamental trade-off between the sensor noise and the DOF. For example, a broadband mask at the aperture makes the point spread function of blur better behaved. This enables computational deblurring and extending the DOF.
The DOF can also be increased by inserting a cubic phase plate near the lens, or by moving the sensor during the exposure time. In both those methods, the acquired image is blurred, but the blur kernel is independent of depth, and therefore, can be deblurred using deblurring methods.
Basics and Limitations of DFD
A camera acquires light from a scene and projects the light on a sensor. Parts of the scene that are in focus are at depth (s0) given by the thin lens law
                                          1                          F                              l                ⁢                                                                                                =                                    1              v                        +                          1                              s                0                                                    ,                            (        1        )            where Fl is the focal length of the lens, and ν is the distance between the lens and the sensor. Scene points that are at distance s≠s0 have a circle of confusion (blur) in the image plane. The distribution of light within this blur circle is referred to as the Point Spread Function (PSF). The PSF is a disc with a radius σ depending on the depth s of scene point:
                              σ          =                                    Dv              2                        ⁢                          (                                                1                                      F                    l                                                  -                                  1                  v                                -                                  1                  s                                            )                                      ,                            (        2        )            where D is the lens aperture.
Typical DFD methods acquire a focal stack F={F1, F2, . . . , FM} of a sequence of M frames Fj (video) acquired at various focus settings. That is, images in a focal stack as known in the art are inherently acquired at different focus depths or focal planes.
The basic assumption in a conventional DFD method is that the scene and camera are static. The dynamic scenes lead to correspondence errors in the DFD, resulting in depth and texture errors. In extended DOF (EDOF) images, the error appears as multiple copies of the moving object, while in the depth map, spurious depth edges are present on and around parts of the scene with motion.
It is desired to correct these prior art problems.