An imager, such as a video or still camera, images a scene by receiving and detecting light emanating from the scene. The incoming light signal from a particular point in the scene has characteristics, such as an intensity, a wavelength spectrum, and a polarization. In addition, the entire light field received by the imager varies with the angle at which the light is received by the imager. Of course, the angle at which a particular light ray or light ray bundle is received depends upon the location of the scene point from which the light emanated.
A number of applications require precise and accurate measurement of the light field. For example, in Imaged Based Rendering (IBR), a scene is imaged and then re-rendered to simulate navigation around the scene. Measurement of the entire light field with respect to both space and direction allows extraction of the geometric structure of the scene. As another example, light reflected from each material and emanating from each illumination source has its own characteristic spectral curve and polarization characteristics. With high spectral resolution it is possible to identify different types of material and illumination, and/or to re-render the scene under different, simulated illumination. Measuring the polarization of light from a scene point provides further information regarding the type of material present at the scene point, and regarding the illumination incident on the scene point. Polarization information has also been used to compensate for the effects of weather conditions when rendering outdoor scenes, and to help measure depth—i.e., the distance of a scene point from the imager. As can be seen from the above examples, a system which precisely and accurately measures the light field has a variety of useful applications.
However, conventional imagers are limited in their intensity resolution, spectral resolution, and polarization resolution—i.e., their ability to resolve differences in intensity, wavelength, and polarization—and are also limited in their spatial resolution—i.e., their ability to resolve differences in the locations of respective scene points. For example, there currently exist digital still cameras capable of capturing high spatial resolution images. However, because of the amount of data involved, these cameras are not capable of producing high resolution video. On the other hand, inexpensive cameras exist that can capture video at 30 frames/second—a respectable temporal resolution. However, such video cameras provide only low spatial resolution. It is particularly difficult to design an imager having high time resolution and high spatial resolution. In addition to the engineering problems associated with high resolution in multiple dimensions, there are often fundamental physical problems. For example, low light conditions require longer exposure times, resulting in coarser temporal resolution and, accordingly, more blurring in imaging of moving objects.
One approach for addressing the above-described problems uses multiple sensors which are “co-located” (i.e., have the same viewpoint) to measure different aspects of the light field. For example, it is possible to co-locate a thermal imager, a range finder, and a visible-light camera. In some cases a multiple-sensor approach can overcome some of the physical limits imposed on single sensors, such as the trade-off between exposure and temporal resolution. However, such an approach requires additional imaging resources. In a situation in which the available resources are finite—e.g., in which there is a fixed number of pixels, a fixed amount of memory, and trade-offs between exposure and time—it is desirable to use these resources as efficiently as possible.
If the light field were simply an unrelated and arbitrary set of intensities, there would be little hope of a solution other than building bigger, faster, and more densely packed sensors. However, there is tremendous structure and redundancy in the light field. For example, when the viewpoint is shifted slightly, the view of the scene typically changes in predictable ways. In addition, the spectral response across a material of a single color will often be relatively uniform. Furthermore, the motions of objects in a scene are often regular and predictable. For example, most objects are rigid, and in many cases, objects tend to move at nearly constant velocities. All of these factors create great redundancies in the light field. As a result, it is usually not necessary to sample the light field at every point in its domain to reconstruct, approximate, or predict the light field.
To exploit the above-described redundancy in the light field, assumptions can be made regarding the structure of this redundancy. For example, interpolation and sampling theory uses assumptions about the regularity of a signal to recover the signal from a limited number of samples. As a particularly well-known example, the Nyquist theorem states that the maximum required signal sampling frequency is limited, provided that the signal being sampled is band limited—i.e., has frequency components within a finite range. In the context of images, the requirement of finite frequency range essentially translates to a limit on the permissible sharpnesses of discontinuities such as edges and corners. The functions used in the Nyquist theorem are trigonometric functions, but polynomials can also be used for interpolation of images. Simple examples include bilinear and bi-cubic interpolation. Unfortunately, the improvement possible from simple interpolation techniques is limited. In particular, the resolution increases provided by such techniques are typically rather modest. Moreover, since natural images often do not conform to the mathematical assumptions inherent in interpolation techniques, such methods can produce aesthetically unpleasant artifacts.
Sparsely sampling an image and interpolating the resulting data effectively acts as a low-pass filter. Accordingly, increasing the spatial resolution of an image can be expressed as a problem of “de-blurring” the image. Sharpening filters, such as Pseudo Inverse and Weiner Filters, have been used to invert Gaussian blur. Other previously used approaches include Bayesian analysis, interpolation along edges, adaptive filtering, wavelet analysis, fractal interpolation, projection on convex sets, variational methods, and level sets. Such approaches improve on basic interpolation, but because they only use local image structure or apply a hypothesized global prior to the behavior of the light field—i.e., an assumption regarding the regularity of the light field—their ability to exploit redundancies is somewhat limited.
Related to sampling and interpolation are techniques known as “super-resolution,” in which relatively course sampling is performed multiple times to improve the effective resolution of the sampling. As with the above-described interpolation methods, super-resolution makes assumptions about the regularity of the light field, and has recently been shown to have theoretical limits.
Various multi-camera systems have been proposed for capturing light fields over wide areas. Such systems typically use interpolation image warping to fill in missing data. For example, in hybrid imaging, images are captured using multiple cameras with different characteristics—e.g., different frame rates or spatial resolutions. A larger part of the light field is filled in based on computed camera geometry, using a combination of interpolation and image warping.
An additional approach is based on texture synthesis and scene statistics. Rather than make mathematical assumptions about the structure of the redundancy in a light field, statistics or pattern analysis are used to model and exploit the redundancy. One technique uses correlations of pixels at different scales. Another approach is to “train” the model using a variety of different textures and a variety of different images of everyday scenes. In the training approach, the training algorithm should be capable of extracting and utilizing the redundancies in the image to improve the image and increase its resolution. If the domain of image types is very limited—such as in the well-known “hallucinating faces” method, in which high resolution images of human faces are synthesized from low-resolution data—training approaches can dramatically improve resolution. However, attempts to model broader domains typically encounter standard problems of machine learning. For example, if the model is trained on very specific domains, the model becomes over-fitted to the particular training data, resulting in poor generalization. For example, if a resolution-enhancement algorithm is trained on faces and then applied to buildings, the algorithm will tend to produce artifacts and low quality enhancement results. On the other hand, if the model is trained on a very broad domain of image types, it learns only very general redundancies that occur in most images. As a result, although a broadly trained model will provide some benefit for most domains, it will not provide extremely good results for any domain.