Binocular viewing of a scene creates two slightly different images of the scene due to the different fields of view of each eye. These differences, referred to as binocular disparity (or parallax), provide information that can be used to calculate depth in the visual scene, providing a major means of depth perception. The impression of depth associated with stereoscopic depth perception can also be obtained under other conditions, such as when an observer views a scene with only one eye while moving. The observed parallax can be utilized to obtain depth information for objects in the scene. Similar principles in machine vision can be used to gather depth information.
Two cameras separated by a distance can take pictures of the same scene and the captured images can be compared by shifting the pixels of two or more images to find parts of the images that match. The amount an object shifts between two different camera views is called the disparity, which is inversely proportional to the distance to the object. A disparity search that detects the shift of an object in the multiple images that results in the best match can be used to calculate the distance to the object based upon the baseline distance between the cameras and the focal length of the cameras involved (as well as knowledge of additional properties of the camera). The approach of using two or more cameras to generate stereoscopic three-dimensional images is commonly referred to as multi-view stereo.
More recently, researchers have used multiple cameras spanning a wider synthetic aperture to capture light field images (e.g. the Stanford Multi-Camera Array). A light field, which is often defined as a 4D function characterizing the light from all directions at all points in a scene, can be interpreted as a two-dimensional (2D) collection of 2D images of a scene. Due to practical constraints, it is typically difficult to simultaneously capture the collection of 2D images of a scene that form a light field. However, the closer in time at which the image data is captured by each of the cameras, the less likely that variations in light intensity (e.g. the otherwise imperceptible flicker of fluorescent lights) or object motion will result in time dependent variations between the captured images. Processes involving capturing and resampling a light field can be utilized to simulate cameras with large apertures. For example, an array of M×N cameras pointing at a scene can simulate the focusing effects of a lens as large as the array. In many embodiments, cameras need not be arranged in a rectangular pattern and can have configurations including circular configurations and/or any arbitrary configuration appropriate to the requirements of a specific application. Use of camera arrays in this way can be referred to as synthetic aperture photography.
The larger the aperture of a camera, the more light that is admitted, but the depth of field is reduced. Objects are well focused at a distance determined by the focal length of the camera lens. Objects at other distances are imaged as a blur, sometimes called the circle of confusion. If the object lies far enough from the imager plane that the circle of confusion is larger than some nominal diameter (called maximum acceptable circle of confusion, representing the blur size for which the image is acceptably sharp and typically defined as the size of one pixel in the camera's sensor), the object can be referred to as outside the depth of field for the current camera's settings. Depth of field is defined as the distance between the nearest and farthest objects in the scene for which the circle of confusion is less than the maximum acceptable value. Introducing an aperture stop (diaphragm) into such an optical system and partially closing it reduces the effective diameter of the lens. This reduces the circle of confusion for objects off the plane of best focus, hence increasing the camera's depth of field. Conversely, opening the diaphragm expands the circle of confusion, decreasing depth of field. If the aperture is made extremely large (e.g. as wide as the distance to the plane of best focus), the depth of field becomes so shallow that only objects lying on the plane of best focus are sharp. When an object lying outside the depth of field is small enough that for every point on the plane of best focus, at least some of its rays still reach the lens, the object no longer obscures the camera's view of these points.