A depth map is an image or image channel that contains information relating to the distance to the surfaces of scene objects from a viewpoint. The depth map is used in many applications such as automotive sensing, medical imaging, and three-dimensional (3D) applications. Generally, a depth map of a scene can be obtained from two approaches: active approach and passive approach.
In active approach, the coded signal (that is, structured light, infrared (IR) signal, laser, or audio signal) is projected or illuminated into the scene and the receiver or the detector receives or captures the projected signal. The depth map is then calculated or estimated based on the difference between the projected signal and the received signal. Examples of the active approach are time-of-flight (TOF) sensor, Light Detection and Ranging (LiDAR), structured light pattern, and ultrasonic range sensor.
In passive approach, the depth map can be estimated from the captured images alone without the need of projecting the signal into the scene. Therefore, passive approach can be realized at a low cost and can be achieved by using the conventional single digital camera.
Several passive depth estimation techniques have been disclosed (for example, refer to Non Patent Literature 1, Non Patent Literature 2, Non Patent Literature 3, and Non Patent Literature 4). These can be classified into two mainstreams: the depth from focus (DFF) method and the depth from defocus (DFD) method. Both the DFF method and the DFD method require multiple input images each of which has a difference focus, for depth estimation. In the DFF method, several images of a single scene are captured at different focus points. Then, the focus or the sharpness (contrast) in each captured image is measured. The depth map of the scene is finally obtained by detecting the maximum sharpness in the images and the corresponding focus setting. In the DFD method, fewer multi-focus images (at least two images) can be used. The depth map can be estimated by a blur amount between pixels in the multi-focus images.
Patent Literature 1 discloses a single-lens camera system for recording depth information of a three-dimensional scene. FIG. 1 shows the system for capturing multi-focus images according to Patent Literature 1. The system moves the lens in the direction of the central axis of the lens to capture a subject (an object) at various distances in front of the lens system. The object passes in and out of focus on the image sensor. With a known focal length of the lens system, the depth map (the distance between the lens system and the object) is computed based on the distance between the lens system and the image sensor when the object is in-focus.
Patent Literature 2 discloses a method for creating a depth map using an all-in-focus image and two-dimensional scale space matching. In the method, multi-images of a single scene are captured. Then, an all-in-focus image is constructed from the captured multi-focus image. The scale space blur images are then generated from the all-in-focus image. Finally, the depth map is created by matching the blur amount in the capture image and the blur amount in the generated scale space blur images.