The advent of digital imaging has enabled advances in the way that an image of a scene can be recorded and viewed. In particular, in modern cameras, an image is not typically formed by exposing film-based photographic emulsions but, rather, by digitally recording an optical image of the scene using an electronic sensor array. As a result, the recording surface on which the image is focused by the camera optics no longer needs to be a single continuous surface. This has enabled the development of array cameras that comprise a plurality of digital cameras—each including a separate sensor array (a.k.a., a focal-plane array) that outputs a digital sub-image of a portion of the scene. The sub-images can then be combined digitally to form a composite image of the entire scene. For the purposes of this Specification, individual modules of an array camera are referred to as “microcameras,” while the composite system is referred to as the “macrocamera,” “array camera” or “camera.” The fields of view of neighboring microcameras typically overlap such that some points in the scene are observed by multiple microcameras while other points may be observed by only a single microcamera.
A sensor array normally comprises a large two-dimensional array of optoelectronic detector pixels, such as charge-coupled device (CCD) elements, photodetectors, etc. The sensor array generates a digital image-data set based on the sub-image formed on its recording surface during image capture.
Pixel count is a basic measure of image quality and, for typically sensor arrays, are within the range of 1-20 million pixels. For video applications, each individual camera of an array camera outputs up to 500 frames per second. As a result, each microcamera provides 1000-10,000 megapixels worth of data every second and a macrocamera may generate many gigapixels per second.
Array cameras form images from composite data provided by one or more microcameras. The composite images enable creation of high-resolution panoramic images or digital fly effects in which the view point might move from one camera to the next. The full range of scene pixels captured by the array camera is called the “data cube.” In general, the full data cube is too large to display on a single display device. Rather, viewers or video analysis systems interactively pan or move their viewpoint through the data cube. The image or video on a display device is composited from pixels captured by one, several or all the microcameras. When the display device or analysis system renders images from all microcameras, low-resolution streams from individual microcameras are required. When the display device zooms to data from a smaller number of microcameras, higher-resolution streams from each microcamera are observed. In the typical array-camera architecture, data is streamed from the array to storage and then read out of storage to render one or more display data streams. Unfortunately, assembling large, composite images from multiple smaller sub-images is very computationally intensive due to the geometrical and radiometric processing of the sub-images that is required to stitch the sub-images together. Further, when the sub-images are often taken at different times, the illumination of the scene can change or there can be motion artifacts associated with objects moving within the field-of-view. As a result, algorithms that compare neighboring images are required in order to mitigate seams between sub-images due to these variations. In addition, distortion, pointing, and non-linearity corrections must be applied to the sub-images.
Such extensive processing imposes a severe time constraint, however, which has historically precluded using multiple sensor arrays for video-rate capture of high-resolution, high-pixel-count imagery. To date, therefore, high-definition video streams have been principally limited to single-sensor-array camera acquisition. As a result, in video-rate applications, numerous separately controlled cameras are typically used to capture a complete scene, where each camera provides only a small-area view of a portion of the scene. For example, a sports broadcast normally relies on the use of many different cameras that are strategically positioned and oriented throughout an arena or stadium. Each camera requires its own camera operator and the multiple camera views must be continuously analyzed in real time by a director who chooses which one camera view is broadcast. In addition to giving rise to inordinate capital and operational expense, such an approach limits the “richness” of the viewing experience.