Digital images are currently used in many different applications. For example, new-generation acquisition devices, such as digital still cameras (DSCs), are commonly used for capturing such images. The availability of sensors of ever greater resolution and low-cost, low-consumption digital signal processors (DSPs) has led to considerable commercial availability of digital still cameras. Yet, it may still be relatively expensive to produce devices that are capable of capturing high-quality digital images.
The quality of an image depends substantially on the characteristics of the sensor with which the image is acquired. This is particularly true for image resolution. The sensor, which in digital still cameras is typically either a charge coupled device (CCD) sensor or a complementary metal oxide semiconductor (CMOS) sensor, is an integrated circuit including a matrix of photosensitive cells or elements, each associated with a corresponding pixel. When the image is acquired from a real scene, each cell produces an electric signal proportional to the light that strikes it. More particularly, each cell responds to the radiance (i.e., emitted quantity of light) of a particular portion of the real scene. This portion is the receptive field of the pixel.
The larger the number of photosensitive cells or the greater the spatial resolution of the sensor (which provide the same result), the denser the information relating to the real scene captured in the acquisition process will be. But the choice of obtaining a higher image resolution by increasing the sensor resolution in terms of number of pixels is not always feasible for reasons that are both technological and economic in nature.
Moreover, when acquiring a digital photograph a sensor, no matter how good its resolution, will always produce an approximation of the scene that is to be captured. Further, the photosensitive cells of the sensor are always separated by a certain distance. This is because not all the sensor area can be uniformly covered with photosensitive elements, and technological reasons make it inevitable that there will be a certain minimum distance between adjacent cells. This spacing is the cause of a first loss of information in the acquisition process.
Another reason why a digital image acquired with a digital still camera sensor provides only an approximation of the real scene is a result of the interpolation process for processing the data acquired by the sensor. As is well known, a digital image may be represented by a matrix of elements (i.e., pixels) corresponding to elementary portions of the image. Each of these elements has associated with it one or more digital values representing the optical components. In a monochromatic image, for example, only a single digital value is associated with each pixel. In this case, the image is made up of only a single channel or plane.
On the other hand, in a color image (which may be in red green blue (RGB) format, for example) each pixel has associated therewith three digital values that correspond, respectively, to the three components (red, green, blue) of the additive chromatic synthesis. In this case the image can be broken down into three distinct planes, each including the information relating to just one of the chromatic components.
A typical sensor will dedicate a single and substantially monochromatic photosensitive cell to each pixel of the image. Furthermore, the sensor is provided with an optical filter including a matrix of filtering elements, each of which covers one photosensitive cell. Subject to a minimal absorption, each filtering element transmits to the photosensitive cell with which it is associated the luminous radiation corresponding solely to the wavelength of the red light, green light, or blue light. Thus, for each pixel only one of the three primary components (R,G,B) of the additive chromatic synthesis is avilable.
The type of filter used varies from one manufacturer to the next. Perhaps the most common of these filter is the Bayer filter. With this filter, the arrangement of the filtering elements is in the “Bayer” pattern, which is shown in the element matrix 10 illustrated in FIG. 2. The electric signals produced by the photosensitive cells are converted into digital values in accordance with conventional methodologies. The digital image obtained in this manner is incomplete. This is because the image is made up of only a single component (R, G or B) for each pixel. The format of this image is conventionally referred to as a color filter array (CFA).
The CFA image is then subjected to a complex reconstruction process to produce a “complete” image (e.g., in RGB format) in which three digital values will be associated with each pixel. This reconstruction implies a passage from a representation of the image in a single plane (Bayer plane) to a representation in three planes (R,G,B). The reconstruction is accomplished through known interpolation algorithms.
It should be noted that the interpolation produces only an approximation of the image that would be obtained with a sensor capable of acquiring three optical components per pixel. In this sense, therefore, the interpolation process introduces yet another approximation into the acquired image.
Given these limitations of the quality of the acquired image introduced by the sensor characteristics and the interpolation process, it is often necessary to perform further processing operations to obtain a high resolution digital image. To this end prior art proposes numerous processing methods. These are generally based on the principle of reconstructing the original information of the real scene by combining the information in a plurality of initially acquired low resolution digital images that all represent the same scene.
To this end, it is necessary that the initially acquired images (which will be reffered to as the “starting images” herein) should together provide some additional information that could not be obtained from identical images. Certain of the prior art methods operate in the spatial domain (i.e., in the pixel domain), and others in the frequency domain. The latter combine a certain number of low resolution starting images after having transformed them in the spatial frequency domain. After the image in the frequency domain obtained from this combination has been brought back into the spatial domain, it has a better resolution than the starting images. However, the methods operating in the frequency domain call for a very considerable computational effort.
The methods that operate in the spatial domain, on the other hand, use an approach known as “back projection”, which is similar to the one utilized, for example, in computer-aided axial tomography (CAT). According to this approach, a two-dimensional object is reconstructed from a series of one-dimensional projections thereof.
The back-projection approach assumes that the low resolution starting images of a real scene represent different projections of a high resolution image that reproduces the real scene. The projection operation is by the same acquisition process, which depends to a large extent on the acquisition device, and is assumed to be known. The problem is thus reduced to reconstructing the high resolution image from its various projections.
In particular, the method used by M. Irani and S. Peleg, described in an article entitled “Super Resolution From Image Sequences” (IEEE, 1990), obtains an iterative reconstruction of the high resolution image by correcting/improving this image in several successive steps. This is done based upon differences between the starting images and images obtained by simulation from the projections of the high resolution image as corrected or improved from time to time (by iteration).
This method has a first drawback in that obtaining high-quality images requires an accurate modelling of the acquisition process (or device) with which the low resolution images have been obtained. For this reason, the above-described approach results in a complicated method that does not lend itself to being implemented in a commercial acquisition device such as a digital still camera.
A second difficulty is that this method requires a considerable number of iterations at each iteration step. This, in turn, may be problematic in devices in which power, processing and data storage resources are at a premium and may effect the commercial success of a product.