1. Field of the Invention
The present invention relates to photographic image acquisition devices and methods thereof, and in particular to still and video cameras comprising at least one imaging lens optically coupled to at least one image sensor comprising a focal plane array of photosensitive elements.
2. Description of the Related Art
Spatial resolution of photographic image acquisition devices is limited by the spatial resolution of the image sensor comprising a focal plane array of photosensitive elements, and the point spread function (optical blur) of the imaging lens optically coupled to the sensor.
There are two basic approaches to increasing the spatial resolution of the image sensor known from the prior art. The first is by raising the spatial density of the focal plane array, and the second is by increasing the optical format of the image sensor and the lens to accommodate a larger number of photosensitive elements.
The first approach requires size reduction of the photosensitive elements that causes a reduction in the number of photons collected by each photosensitive element per unit of time, and thus worsens the signal-to-noise ratio and the dynamic range of the image. Under the low light, the effective image resolution may drop due to the elevated image noise that drowns small image details, and due to an increased motion blur caused by a longer exposure time required to compensate for the elevated image noise. There is a hard physical limit on size reduction of the photosensitive elements imposed by light diffraction that has been already reached by the current sensor technology.
The second approach leads to an exponential cost increase of the image sensor due to an exponential dependency between the physical dimensions of the focal plane array and the fabrication cost. The same exponential cost increase applies to the corresponding large-format lenses.
As an alternative to sensor resolution improvements, known in the prior art are post-acquisition computational methods that increase spatial resolution of the images post-capture. One of such methods is pansharpening, the fusion of at least two images acquired by at least two separate image sensors: a higher-resolution panchromatic image sensor and a lower-resolution multi-spectral sensor. As an example, the multi-spectral image sensor may comprise Bayer color filter array deposited on top of its focal plane array, as practiced in the art.
In the prior art, pansharpening is generally considered as a global substitution of the luminosity component or an intensity component of the multi-spectral image with the higher-resolution panchromatic image. For the substitution to succeed, both images must be fully matched in scale and the field of view, and perfectly registered to each other globally and locally, with no parallax present. Historically, the pansharpening techniques have been developed for and applied in aerial and space imaging where the distance from the focal plane to the objects in the field of view is essentially infinite, and where the panchromatic image and the multispectral image are acquired sequentially while flying over the same land surface area, thus eliminating the parallax problem, so that both fields of view, the panchromatic and the multi-spectral, are fully matched and perfectly registered to each other globally and locally.
Also known in the art are pansharpening methods that admit a mismatch between the panchromatic and the multi-spectral images due to parallax, said methods comprising additional means for resolving said mismatch, for example, a light projection based depth of scene estimation device integrated into the photographic image acquisition device, and a related computational method as disclosed in U.S. Pat. No. 8,319,822, incorporated herein by this reference.
It would be evident to those skilled in the art that the task of pansharpening in the case of mismatched panchromatic and multi-spectral images is substantially complicated due to parallax and occlusions.
As yet another alternative to post-capture image resolution improvement, known in the art are computational methods collectively known as super-resolution image reconstruction, or super-resolution that aim at reverting the effects of blurring in the lens and downsampling in the focal plane array. In contrast to pansharpening, super-resolution does not fuse a separately acquired higher-resolution image with a lower-resolution image: it exploits the intrinsic properties of the lower-resolution image itself.
Early attempts at computational super-resolution reconstruction in the prior art relied on exploiting relative motion between the scene and the camera. By acquiring a sequence of multiple low-resolution images, each producing a generally different sub-pixel offset relative to the sampling grid of the image sensor due to motion, and then registering these multiple low-resolution images on a higher-resolution grid, attempts were made at reconstructing a single super-resolved image. However, said methods required precise sub-pixel motion estimation, which is generally hard to achieve when non-global motion is present, especially under image noise. Importantly, in the absence of relative motion between the scene and the camera, said techniques cannot produce any resolution improvement.
In a more recent computational approach to super-resolution in the prior art, the desired sub-pixel offsets analogous to the sub-pixel offsets caused by the relative motion between the camera and the scene are found to exist in the low-resolution image itself due to a property of nonlocal self-similarity and redundancy at the scale of small image patches. In natural images, multiple similar or substantially similar image patches are typically present at different locations in the same image, each patch comprising a small group of pixels, for example, a square of 9 by 9 pixels.
Because similar image patches at different image locations are a product of sampling similar areas of the scene by a finite-resolution sampling grid of the image sensor, said image patches generally comprise random sub-pixel offsets relative to the grid, and thus provide additional sub-pixel resolution information. By exploiting multiple similar patches found at different image locations, the effective resolution enhancement factor up to a factor of three in each of the two image dimensions may be achieved, corresponding to an increase of the effective pixel count by a factor of nine.
Also known are patch-based super-resolution methods that employ a database of examples of high-resolution images and their blurred and subsampled low-resolution copies. Said database of examples is utilized to extract a compact dictionary of pairs of low- and high-resolution image patches using a variety of learning techniques known in the art. Importantly, said dictionary of patch pairs is made substantially compact due to said property of self-similarity and redundancy at the scale of small image patches. A low-resolution image is then super-resolved using said dictionary of pairs.
In said example-based approach, the effective resolution enhancement factor may exceed that of the non-example based super-resolution approach, the approach based on sub-pixel sampling offsets. However, the super-resolved images may be less reliable in some applications because they are based on example-based predictions of what the high-resolution image might look like, as opposed to relying on the actual data present in the image itself.
Also known in the art are example-based super-resolution methods based on said patch-pair dictionary learning, while the number of entries in the dictionary is substantially reduced. Said dictionaries are composed of elementary patch atoms, such that any small image patch is closely approximated by a linear sum of a very small subset of these atoms. The ability to approximate any image patch by a linear sum of a few elementary atoms is due to a fundamental property of sparsity of natural images in certain mathematical domains known to those skilled in the art. The property of sparsity and the property of nonlocal self-similarity and redundancy at the patch scale are closely related. Sparse coding methods are known in the art for their high computational cost due to the combinatorial nature of identifying the best combination of patch atoms that matches a given patch.
Further, it is known from the prior art that similar image patches are found at different locations not only in the same image, but also in its downscaled copies created by blurring and subsampling of the image. This property of cross-scale nonlocal patch similarity is the basis of yet another example-based super-resolution method as disclosed in patent application US2012/0086850, incorporated herein by this reference, which in contrast to the other example-based methods does not require an external database of high-resolution images or the dictionary of patch pairs.
Said cross-scale method first locates a pair of similar patches A and B in the actual image and its downscaled copy, and then applies the coordinates of patch B in the downscaled copy as a pointer to a corresponding location in the actual (non-downscaled) image to extract a higher-resolution patch C corresponding to the downscaled patch B, which may not generally coincide with the location of patch A. Higher-resolution patch C is further used in formation of a super-resolved image on a pixel-by-pixel basis by applying C at the same location in the super-resolved image as the location of patch A in the actual image.
Importantly, the super-resolved image is thus formed using patches from locations in the actual image that are generally different from the “correct” locations in the unknown high-resolution image. However, due to the property of nonlocal self-similarity and redundancy at the scale of small patches this substitution produces visually acceptable results. When this example-based method is combined with the method based on sub-pixel sampling offsets, the combined result is further improved.
In order to achieve a higher-scale resolution enhancement while avoiding image artifacts, the above technique may be applied over multiple iterations with a gradual increase in the cross-scale factor and using the intermediate results as a starting point for the next iteration. An additional step of back-projection of the newly formed high-resolution image onto the low-resolution image via blurring and subsampling is known in the art as a means for verification, regularization, and error avoidance.
In the example-based methods, the degree of match depends on the choice of the database of examples used in dictionary learning, and the choice of the scale gap between the high- and low-resolution image pairs in the database. The smaller the scale gap is, and the closer the database of examples is to the category of images to be super-resolved, the higher is the match between the super-resolved image and the unknown high-resolution image. In particular, increasing the effective pixel count by a factor of four, may generally produce a close match between the reconstructed image and an unknown high-resolution image, while increasing the effective pixel count by a factor of sixteen or higher may not produce a meaningful result due to the predictive nature of the example-based methods.
Super-resolution image reconstruction involves not only the step of upsampling of the low-resolution image, but also a step of reversing the optical blur incurred in the process of the photographic acquisition of the low-resolution image. Reversing the blur requires an estimation of the point-spread function (PSF) of the imaging lens, which is generally unknown, but is typically assumed to have a Gaussian shape. Methods of blur kernel estimation are known in the prior art: as an example, one approach is based on inferring the blur kernel by maximizing the match between the low-resolution patches and their blurred and down-sampled higher-resolution matches found in the same image, as disclosed in “Nonparametric Blind Super-Resolution,” a 2013 publication by Michaeli and Irani, incorporated herein by reference.
Importantly, post-capture computational super-resolution methods from the prior art as disclosed herein, are computationally expensive. In particular, the amount of computational power that would be required to perform such computations in real-time during image acquisition (as opposed to post-capture) exceeds by orders of magnitude the computational power typically available in most cameras, including high-end cameras used in certain military applications. Moreover, even if the sufficient computational power were available for a real-time super-resolution reconstruction, the additional bandwidth to transmit the super-resolved images and the additional storage space to record them would be impractical in many applications.
To summarize heretofore, known in the prior art are example-based approaches to super-resolution, as well as the non-example based ones that rely on sub-pixel offsets present in multiple similar image patches. At least one category of said example-based approaches relies on a database of high-resolution images and their blurred and subsampled low-resolution copies, while another relies on the examples found in the low-resolution image itself, with no external database used. The effective resolution enhancement factor is typically lower in the non-example based methods relative to the example-based methods, however the usage of the latter may be limited in some applications due to their predictive nature. In contrast to the super-resolution methods, the pansharpening methods rely on a separate higher-resolution image independently acquired by a panchromatic image sensor, but do not involve an increase of resolution beyond that of the panchromatic sensor.
Therefore, the main objective of the present invention is to provide a photographic image acquisition device and method thereof that raise the spatial resolution of the acquired images substantially beyond the individual capabilities of either the pansharpening approach or the super-resolution approach, and to advantageously produce a multiplicative effect on resolution enhancement unachievable by each of the two approaches separately. Said multiplicative effect is gained herein because the two components of the invention, the super-resolution reconstruction component and the pansharpening component, are made mutually interdependent through the property of nonlocal patch similarity and redundancy: without the panchromatic image, the super-resolution of the multi-spectral image by itself cannot achieve a high scaling factor, while the pansharpening by itself cannot reach beyond the resolution of the panchromatic sensor. It is another objective to provide a pansharpening method admitting a mismatch between the panchromatic and the multi-spectral images due to parallax without involving any additional hardware or methods for depth of scene estimation, as practiced in the art. It is yet another objective to reduce the computational burden, bandwidth and storage space typically required in managing extreme resolution images and video.