An embodiment of the invention is directed to signal processing techniques to obtain a higher resolution, HR, image (or sequence of images) from multiple observed lower resolution images. Other embodiments are also described.
In most electronic imaging applications, images with higher resolution are generally more desirable. These are images that have greater pixel density and hence show greater detail than lower resolution images of the same scene. HR images have many applications, including medical imaging, satellite imaging, and computer vision.
An HR image may be obtained by simply increasing the number and/or density of pixel sensor elements in the electronic image sensor chip that is used to capture the image. This, however, may increase the size of the chip so much that capacitance effects will hamper the rapid transfer of pixel signal values, thereby causing difficulty for obtaining high-speed captures and video. Another possibility is to reduce the physical size of each pixel sensor element; however, doing so may increase the noise level in the resulting pixel signal value. Additionally, increasing the number of pixel sensor elements increases the cost of the device, which in many situations is undesirable (e.g., cameras mounted on mobile devices whose primary function is not image acquisition, like personal digital assistants (PDA) and cellular phones), and in others is prohibitive (e.g., infrared sensors). Therefore, another approach to obtaining HR images (that need not modify the lower resolution sensor) is to perform digital signal processing upon multiple lower resolution (LR) images captured by the sensor, to enhance resolution (also referred to as super resolution, SR, image reconstruction).
With SR image reconstruction, multiple observed LR images or frames of a scene have been obtained that in effect are different “looks” of the same scene. These may be obtained using the same camera, for example, while introducing small, so-called sub-pixel shifts in the camera location from frame to frame, or capturing a small amount of motion in the scene. Alternatively, the LR images may be captured using different cameras aimed at the same scene. A “result” HR image is then reconstructed by aligning and combining properly the LR images, so that additional information, e.g. an increase in resolution or de-aliasing, is obtained for the result HR image. The process may also include image restoration, where de-blurring and de-noising operations are performed as well, to yield an even higher quality result HR image.
The reconstruction of the result HR image, however, is a difficult problem because it belongs to the class of inverse, ill-posed mathematical problems. The needed signal processing may be interpreted as being the reverse of a so-called observation model, which is a mathematically deterministic way to describe the formation of LR images of a scene (based upon known camera parameters). Since the scene is approximated by an acceptable quality HR image of it, the observation model is usually defined as relating an HR discrete image of the scene (with a given resolution and pixel grid) to its corresponding LR images. This relationship (which may apply to the formation of both still images and video) may be given as the concatenation of a geometric transform, a blur operator, and a down-sampling operator, plus an additive noise term. Examples of the geometric transform include, global or local translation and rotation, while the blur operator attempts to duplicate camera non-idealities, such as out of focus, diffraction limits, aberration, slow motion blur, and image sensor integration on a spatial region (sometimes combined all together in a point spread function). The down-sampling operator down samples the HR image into aliased, lower resolution images. This observation model may be expressed by the mathematical relationshipY=W*f+n,  (1)where Y is the set of observed LR images and W represents the linear transformation of HR pixels in an HR image f to the LR pixels in Y (including the effect of down-sampling, geometric transform and blur). The n represents additive noise having random characteristics, which may represent, for example, the variation (or error) between LR images that have been captured by the same camera without any changes in the scene and without any changes to camera or lighting settings. Based on the observation model in Equation (1), SR image reconstruction estimates the HR image f that corresponds to a given set of LR images Y.
A Bayesian estimation process (also referred to as stochastic or probabilistic SR image reconstruction) may be used to estimate f, to get the “result” HR image mentioned above. In that case, an “a posteriori” probability function (typically, a probability density function) is mathematically defined as p(f|Y), which is the probability of a particular HR image f given the set of observed LR images Y. Applying a mathematical manipulation, known as Bayes Law, the optimization problem, which is finding a suitable HR image f, e.g. one that has the highest probability given a set of LR images or that maximizes p(f|Y), may be re-written asP(f|Y)=p(Y|f)*p(f),  (2)where p(f) is called the “Prior” probability density function that gives the probabilities of a particular HR image prior to any observation. The Prior indicates what HR images are more probable to occur based on, for example, a statistical characterization of an ensemble of different HR images. The Prior probability may be a joint probability, defined over all of the pixels in an HR image, and should be based on statistical data from a large number of images. However, estimating and describing the Prior probability as a joint distribution over all pixels may not be computationally feasible. Accordingly existing methods use approximate models, based on the fact that in many types of images, correlations among pixels decay relatively quickly with pixel distance. For example, the Prior may be based on a probabilistic construct called Markov Random Fields (MRFs). Rather than take the position that all HR images are equally likely, the MRF is tailored to indicate for example that certain pixel patterns (e.g., piece-wise continuous; text images) are more likely than others. An image may be assumed to be globally smooth in a mathematical sense, so the MRF typically used to define the Prior has a normal (Gaussian) probability distribution.
As to p(Y|f), that is called the “Likelihood” function; it is a probability density function that defines the probabilities of observing LR images that would correspond to a particular HR image. The Likelihood may be determined based on the observation model described above by the mathematical relationship in Equation (1), where the noise term is typically assumed to have a Gaussian probability distribution. The estimation process becomes one of iteratively determining trial HR images and stopping when there is convergence, which may signify that a maximum of the a posteriori probability function has been reached.