Modern mobile devices are usually equipped with photo and video cameras, enabling capture images of very high quality. However, to capture visually-magnified images, the mobility requirement of such devices doesn't allow the use of optical systems (lenses) with variable focal lengths (variable magnification zoom lens) because of their big size. Thus, such mobile devices resort to use digital-zooming.
The following solutions are known in the field:
Digital-zooming method is used to get an enlarged image with low resolution. Only the central part of a sensor is active while using this method of visual-magnification. Then, to obtain images with a number of pixels equal to the total number of pixels of a sensor, the reduced image from the sensor's central part is interpolated by one of the known methods of two-dimensional interpolation (bilinear or bicubic).
Digital-zooming limitations:                linear image-blurring magnification, caused by motions during exposure, occurs during interpolation. The use of traditional stabilizing systems [David Sachs, Steven Nasiri, Daniel Goehl “Image Stabilization Technology Overview”] is difficult because of the mobility requirement;        an interpolated signal doesn't contain high-frequency components, which leads to indistinct edges and a lack of details.        
There is a method of image enhancement [Michal Irani, Shmuel Peleg “Super Resolution From Image Sequences”, ICPR, 2:115-120, June 1990] using several frames with small spatial shifts between them, to enlarge resolution or get super-resolution. In this method, the convergence to an optimal image of high-resolution is done iteratively. Iterations start with creating initial (crude) versions of a high-resolution image. As a rule, such an initial version is created by a simple summation of interpolated images of low resolution. The second step of iteration includes re-creation of low resolution images from this version of high-resolution image, matching them with initial images of low resolution, and evaluation of correction factor. Further iterations evaluate new versions of high-resolution image, taking into account the correction of the previous iteration.
The limitation of this method is an extremely low speed, because of a high number of iterations. Another limitation is the unpredictability of the necessary number of iterations.
Another method of image enhancement by increasing resolution [A.V.Nasonov and A.S.Krylov, Fast super-resolution using weighted median filtering // Proc. Intern. Conf. on Pattern Recognition. Istanbul, Turkey: IEEE Computer Society Press, pp. 2230-2233, 2010], during which the regularization method by Tikhonov is used to ensure a convergence of iterated approaches to the result with high-resolution, is known. This method is effective when shooting several image frames to get a visually magnified image of enhanced resolution.
The limitation is that, because of inevitable pauses appearing between photos while shooting using the traditional method, the moving (unsteady) objects in the frame will be captured blurry or with ghosting. This method doesn't provide the opportunity to correct distortions (blur/indistinctness) of the camera's optical system. Besides, even though median filtering used in this method preserves sharpness of edges, it destroys small image details, enhancement of which is one of the purposes of super-resolution.
One other known method of resolution enhancement of sequences of images that contain the amount of information higher than single 2D image is described in [Jung-Hyun Hwang, Hweihn Chung, Sung-Ii Su, Yong-Chul Park, Chul-Ho Lee “High-resolution digital-zooming using temporal IIR filter”, IEEE Transactions on Consumer Electronics, Vol. 42, No. 3, August 1996]. Movement detection on a subpixel level and IIR filtration along the time scale for visual image enlargement achieving high-resolution, as well as for digital image stabilization, are introduced. Experimental results, based on the real sequences of images, are shown.
The processing steps of this method are: data acquisition from a sensor, alignment, magnification, image multiplexing/filtration by means of linear filter are being done; at the same time, each incoming frame is added to the previous result, using different weights. Additional convolution with a rectangular window, (i.e. post filtration), is performed after image magnification, but before multiplexing in such a way, that the image shifted by subpixel distance can be directly summed to (filtered by an IIR filter) the pixels of the previous result.
The first limitation of this method is the fact that simplicity of the output filter doesn't allow for an optimally-sharp final image. Besides, the filter doesn't use the adjacent, neighboring image pixels, thus preventing the correction of distortions (blur/indistinctness) in the camera optical system. Data acquisition from a sensor is performed by the standard low-speed method, leading to blurred images, as well as to doubling unsteady objects (ghosting).
A method of enhancing image sharpness [Masaaki Hayashi, “Neurofilter, and method of training to operate on image data so as to discriminate between text and image regions of an image which is expressed by image data” U.S. Pat. No. 6,301,381], within which one nonlinear filter, realized with the help of the neural network, is used for dividing an image into areas containing text, and areas containing diagrams, and the other nonlinear filter, also realized with the help of the neural network, is used for enhancing image sharpness, is known. Both filters are designed as follows:
from the image area including the data of the pixel for which filtering is performed, this pixel value and the neighboring pixel values are being read;
values of the selected pixels are transferred to the input of the previously trained neural network;
in case of a sharpness enhancing filter, the neural network gives the value of the pixel for forming a sharp image;
in case of a filter used to distinguish a text from figures, the neural network gives a signal with a level proportional to the probability of text presence in this image area.
The limitations of this method are the following:
only one frame is used as the input, which doesn't allow for a decrease in the level of noise in the final image comparing to the input;
the high dynamic range of pixel values prevents the effective operation of the neural network;
as a result of processing, the image sharpness enhances, but there is no enhancement of the image resolution.
Yet another method of image resolution enhancement [Lin, et al. “Method for image resolution enhancement” U.S. Pat. No. 7,187,811], within which one image frame is used as the input is known. When using this method the areas of the input image are classified into two groups: areas of the image which have edges, and the ones that don't. The areas of the image without edges are interpolated by means of the simple bilinear interpolation. The areas with edges are interpolated by the neural network. Such division into two categories, and their separated interpolation helps to avoid such limitations, common for traditional methods of interpolation (bilinear and bicubic), as the “staircase” effect of the inclined edges of the image.
Within such a method, a nonlinear digital filter (interpolator), designed by the neural network, is used for those images with edges. The neural network is pretrained with the help of “field” natural images. Input data for the interpolator includes area coordinates, the “quality” of the edge, declination of the edge, the value of the pixel in process and neighboring pixels. The “quality” and declination of the edge are calculated on the basis of the data of the pixels included in the area. These data are transferred to the input layer of the neural network. The neural network multiplies input data to weights, determined during the pretraining of the neural network, and applies predetermined nonlinear transfer functions. The value of the interpolated pixel makes up the output result of the neural network. Within this method, the neural network performs the function of the nonlinear filter, to the input layer of which the area coordinates, the “quality” of the edge, declination of the edge, the value of the pixel in process and neighboring pixels are transferred directly. The neural network gives the value of the interpolated pixel immediately.
The limitations of this method are the following:
only one frame is used as the input, that doesn't allow to decrease the level of the noise in the final image, if comparing to the input;
the neural network is trained to distinct the predetermined, limited set of patterns (variants of edges orientation), that leads to incorrect interpolation of the images, which don't look like the ones of the training set;
the high dynamic range of pixels values prevents the effective operation of the neural network;
there is the necessity to divide images into two groups during the processing; this requires additional computational resources.
During the analysis of the total amount of the information available for acquaintance the author of the present invention didn't find any technical solutions which could solve the task of getting high-resolution images while visually magnifying them the way it is described in the present invention.