This section is intended to provide a background to the various embodiments of the technology described in this disclosure. The description in this section may include concepts that could be pursued, but are not necessarily ones that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, what is described in this section is not prior art to the description and/or claims of this disclosure and is not admitted to be prior art by the mere inclusion in this section.
A light field is a concept proposed in the computer graphics and vision technology, which is defined as all the light rays at every point in space travelling in every direction. A light-field camera, also called a plenoptic camera, is a type of camera that uses a microlens array to capture 4D (four-dimensional) light field information about a scene because every point in the three-dimensional space is also attributed a direction. A light field cameras has microlens arrays just in front of the imaging sensor, which may consist of many microscopic lenses with tiny focal lengths and split up what would have become a 2D-pixel (length and width) into individual light rays just before reaching the sensor. This is different from a conventional camera which only uses the two available dimensions of the film/sensor. The resulting raw image captured by a plenoptic camera is a composition of many tiny images since there are microlenses.
A plenoptic camera can capture the light field information of a scene. The light field information then can be post-processed to reconstruct images of the scene from different point of views after these images have been taken. It also permits a user to change the focus point of the images. As described above, compared to a conventional camera, a plenoptic camera contains extra optical components to achieve the mentioned goal.
The plenoptic data captured by an unfocused plenoptic camera are known as the unfocused (type 1) plenoptic data, and those captured by a focused plenoptic camera are known as the focused (type 2) plenoptic data.
In a type 1 plenoptic camera (like Lytro), an array of micro-lenses is placed in front of the sensor. All the micro-lenses have the same focal length and the array of the micro-lenses is placed one focal length away from the sensor. This configuration obtains maximum angular resolution and low spatial resolution.
Having several aligned views of the scene, one intuitive application of the type 1 plenoptic data captured by an unfocused plenoptic camera is to estimate the depth of the scene. Known solutions of depth estimation are usually performed by estimating the disparity of pixels between the views.
One exemplary algorithm, the block-matching method, was discussed in the reference written by N. Sabater, V. Drazic, M. Seifi, G. Sandri, and P. Perez, “Light field demultiplexing and disparity estimation,” HAL, 2014 (hereinafter referred to as reference 1).
More specifically, in the algorithm of the reference 1, first different images of the scene from different points of a view are extracted from the captured plenoptic data. Then, by extracting all the views of the plenoptic data, a matrix of views is reconstructed from the plenoptic data. This matrix of views is then used to estimate the depth of scene objects in view of the fact that the displacement of every pixel on different views is proportional to the depth of the corresponding object.
Estimating methods of the known solutions for unfocused plenoptic data are usually time consuming and not very accurate on non-textured areas.
Another exemplary depth estimation method, which is based on Epipolar Images of the scene, is discussed in the reference written by S. Wanner and B. Goldleuke, “Variational light field analysis for disparity estimation and super-resolution”, IEEE transaction of pattern analysis and machine intelligence, 2013 (hereinafter referred to as reference 2). The reference 2 proposes to calculate the structure tensor (gradients) to decide which pixels are used to estimate the disparities.
However, the depth estimation method in reference 2 is proposed for plenoptic data captured by a focused camera, which is therefore not optimal for unfocused plenoptic data due to the low resolution of the Epipolar images.