Image manipulation programs are used to modify or otherwise use image content captured using a camera. For example, an image manipulation program can generate a depth map. A depth map describes the “depth” of different objects in an image. The depth of an object in an image is distance between the object from a given view point such as camera used to capture the image. By generating a depth map describing the depths of various objects in an image, an image manipulation application can generate three-dimensional image content from two or more images captured from one or more viewpoints and depicting at least some of the same objects.
For example, FIG. 1 is a modeling diagram depicting an image manipulation algorithm 20 generating a depth map 22 from input images 10a, 10b. The images 10a, 10b can include image content captured by a camera from two perspectives. The difference in perspectives between images 10a, 10b causes a horizontal distance 16a between objects 12, 14 in image 10a to change to a horizontal distance 16b between objects 12, 14 in image 10b. The difference in perspectives between images 10a, 10b also causes a vertical distance 18a between objects 12, 14 in image 10a to change to a vertical distance 18b between objects 12, 14 in image 10b. The image manipulation application 20 can generate a depth map 22 using the difference between horizontal distances 16a, 16b and the difference between vertical distances 18a, 18b. The depth map 22 can describe a distance between the objects 12, 14 and a camera used to capture images 10a, 10b. The depth map 22 can be used to, for example, generate three-dimensional content using images 10a, 10b. 
One existing solution for generating a depth map involves using a stereo algorithm to perform depth estimation. Stereo algorithms take two images and estimate the displacement of each pixel only along an x axis rather than on both an x axis and a y axis. Among other deficiencies, stereo-algorithms require epipolar geometric correction to generate depth-maps. Epipolar geometric correction involves two or more images being captured such that the two or more images are aligned in one dimension (i.e., a y axis) such that displacement occurs only in another dimension (i.e., the x axis). Manufacturing limitations of consumer light-field cameras limit the feasibility of making such calibrations.
Another existing solution for generating a depth map involves using an optical flow algorithm to perform depth estimation. An optical flow algorithm receives two images as inputs. The optical flow algorithm estimates the displacement between the two images of each pixel on both an x axis and a y axis. Among other deficiencies, the expanded search space used by optical flow algorithms (i.e., both the x and y) result in poor quality of depth estimation.
Accordingly, systems and methods are desirable for improving the accuracy of depth estimation.