The present invention relates to increasing the quality of images obtained from a low resolution source.
It is often desirable to create a printed copy of an image obtained from a video source or camera, such as digital still camera or digital video camera. However, digital still cameras and digital video cameras typically have low spatial resolution as compared with the resolution of many current printers. Accordingly, the quality of the printed image obtained by such sources is of low quality, and generally unacceptable.
Increasing the resolution of the image beyond the resolution of the imaging sensor may be used to create an improved output image. One method to increase the resolution of a particular image is to use interpolation techniques based on a single image. Linear interpolation techniques based on a single image do not increase the actual information content of an image. In fact, linear interpolation of a single image simply increases the number of pixels and/or lines in the image. Non-linear interpolation techniques based on a single image utilize information about the image structure itself, such as for example, direction of edges and image object geometry, to increase the number of pixels and/or lines in the image. In some cases, non-linear techniques may result in an improved image quality over linear techniques. Examples of single image techniques include, U.S. Pat. Nos. 5,579,451; 5,579,445; and 5,880,767.
Increasing the resolution of an image beyond the resolution of the imaging sensor by processing multiple images of the same or similar image provides a suitable technique to potentially create higher quality images from cameras equipped with inexpensive low resolution sensors. This also permits the creation of higher resolution images than the physical capability of any given sensor for any particular imaging device.
In general multi-frame image enhancement techniques include a low resolution digital camera, either a still camera or video camera, that acquires two or more aliased image of the same scene in such a way that the only differences between the shots are due to camera motion. The motion may be from hand-held jitter or it may be artificially induced. The artificial inducement may be mechanical vibrations of the camera or by movement of the scene relative to the sensor by electronic, mechanical, optical techniques, or combinations thereof. In this way each captured frame is a slightly different low resolution sampling of the scene. Film based images may be used with subsequent sampling, such as with a digital scanner.
One image from the low resolution sequence of images is selected to be the reference image. This is typically the first image but the reference image may be another image, if desired. Depending on the motion estimation algorithm it may turn out that the choice of a later image produces better motion estimation. The coordinate system of the reference sampling lattice is then used to define the high resolution sampling lattice on which the enlargement is constructed. The high resolution sampling lattice is in effect an up-sampled low resolution reference frame. In other words, the techniques typically use the coordinate system from the low resolution reference frame from which to define the high resolution reference frame.
Next, global motion between each low resolution sequence image and the reference image is estimated by any of several techniques including optical flow, and single- or multiple-model parametric methods. This results in one of two possible sets of motion vector xe2x80x9cfieldsxe2x80x9d that relate the low resolution sequence of frames to the reference frame and, therefore, to the derived high resolution sampling lattice. The motion vectors may be explicitly defined, as with an optical flow technique, or implicitly defined through motion model parameters associated with each low resolution frame. xe2x80x9cExplicitly definedxe2x80x9d generally refers to where individual pixels or groups of pixels are mapped from one frame to another. xe2x80x9cImplicitly definedxe2x80x9d generally refers to a mathematical model with parameters that relate the frames with respect to each other. Explicit motion vectors can always be generated by evaluating the motion model at every pixel location.
Once the relationship between the low resolution frames and the high resolution sampling lattice is established, construction of the enhanced resolution image proceeds. The principle underlying this technique is that each low resolution image contains unique information about the scene, and is not merely duplicative. This is because each image is an aliased representation of the scene, and each image is the result of sampling the scene on a different lattice.
The reconstruction of the frame involves the combination of the information in the reference frame and additional unique information from the remaining low resolution frames to build a new image on the high resolution sampling grid that has more resolution than any one of the low resolution frames. The combination is driven by the vector data delivered from the motion estimator.
Referring to FIG. 1, a general technique is shown for purposes of illustration. A low resolution sequence of four frames 20a, 20b, 20c, and 20d, depicting a corresponding scene is an input to a motion estimation module 22 and an enhanced resolution multi-frame reconstruction module 24. The scene is positioned slightly differently relative to the sampling grid of each low resolution frame 20a-20d due to camera motion or any other relative motion, as previously described. The sampling grid of dots shown is merely illustrative, with the actual sampling grid typically having a much finer pitch. Associated with each sampled scene is an underlying spatially continuous band-limited image which the samples represent via the Sampling Theorem. The continuous-space images 26a and 26c for the reference frames 20a and 20c are shown directly below their respective frames. There are small differences between the frames 26a and 26c which are due to an assumed aliasing. These differences represent unique information in each frame that is used by the reconstruction algorithm for the high resolution frame. The differences are more visible in the magnified continuous-space images 28a and 28c shown below the reference frames 20a and 20c. The magnified images 28a and 28c represent single-frame linear-system enlargements of the associated sampled images, at the same scale as an enhanced enlargement 30, hence the aliasing artifacts are also enlarged. This linear process of enlargement to obtain magnified images 28a and 28c enlarges the aliasing artifacts. Some of the differences are noted with small arrowheads in the enlarged continuous-space references 28a and 28c. Neither enlargement 28a and 28c contains more resolution than the un-magnified images 26a and 26c. To the left of the high resolution enhanced enlargement 30 is the associated underlying continuous image 32 from which may be clearly observed a definite resolution improvement.
Referring to FIG. 2, one potential set of motion vector fields is illustrated that can result from motion estimation, as implemented by Tekalp, Ozkan, and Sezan, xe2x80x9cHigh-Resolution Image Reconstruction From Lower-Resolution Image Sequences And Space-Varying Image Restorationxe2x80x9d, IEEE International Conference on Acoustics, Speech, and Signal Processing, (San Francisco, Calif.), Volume III, pages 169-172, Mar. 23-26, 1992. Tekalp et al. teach that each pixel in each low resolution frame has associated with it a motion vector pointing back into the reference image to the (possibly non-pixel) location from where the pixel comes. The sequence shows an N-frame sequence in which the scene has slightly differing positions relative to the sampling lattices in each frame (similar to FIG. 1). The motion vector 40 points out of a pixel in frame 42a and into the reference frame 44 at the same (non-pixel) scene location. Similarly there are motion vectors for the remaining pixels of frames 42b-24N. The collection of vectors from a single frame constitute one vector field. The Nxe2x88x921 vector fields are abstractly represented by the curved arrows 46a, 46b, and 46N which emanate from the low resolution frames 42a-42N and point back to the reference frame 44. There is also the identity vector field pointing from each reference pixel to itself.
Since the high resolution sampling lattice is defined by the coordinate system of the reference frame 44, the motion vectors may be considered as pointing directly into the high resolution lattice associated with the reference frame 44. Referring to FIG. 3, an exemplary high resolution lattice 50 is shown with a magnification of two. The open circles are the high resolution pixel sites. Each cross xe2x80x9c+xe2x80x9d is the head of one motion vector whose tail starts at a pixel in one of the low resolution frames. Therefore, each cross represents an intensity value from one low resolution pixel. For this example, it is presumed that the motion is random so that the location of most of the crosses are likewise randomly positioned. However, the field pointing from the reference to itself is shown by crosses inside circles with even coordinates. These crossed circle sites correspond to pixels from the reference frame. In general all, or part, of the reference pixels will map directly onto high resolution sites because the relationship between the high resolution lattice and the reference lattice is merely a scaling operation.
The resulting pixel sites are clearly scattered mostly at locations that do not match the high resolution lattice 50 sites (open circles and crossed circles). Accordingly, it is clear that a non-uniform interpolation technique must be used to reconstruct the resulting high resolution image. In any case, a non-uniform interpolation must be defined which can use whatever random low resolution pixels fall within a high resolution site""s general region to interpolate the intensity of a high resolution pixel. This region may be referred to as the support or footprint 51 of the non-uniform interpolation kernel. In FIG. 3, the support of a typical kernel is shown in grey, centered on the high resolution pixel to be interpolated. Special attention must be paid to regions, such as the lower right corner of the frame, where low resolution pixel density is low. Given that low resolution pixels can fall within the kernel at arbitrary locations the required non-uniform interpolation tends to be complex and computationally intensive. In addition, an entire low resolution sequence must be available throughout the reconstruction process to store all the pixels mapped onto the high resolution frame. Accordingly, products incorporating the technique shown in FIGS. 2 and 3 need a relatively large amount of on-line RAM to store data representing the required frames.
It is noted that Patti et al. teach the use of the aforementioned motion vectors with monochrome low resolution frames sampled on a quincunx grid to reconstruct a high resolution image on an orthogonal lattice. However, Patti et al. do not teach a direct reconstruction technique nor does Patti et al. teach how to process color frames. Instead of a direct non-uniform interpolation within the high resolution frame, Patti et al. teach an indirect technique of Projection onto Convex Sets, to compute increasingly better approximations of the scene sampled on the high resolution grid, at substantial computational expense.
Another potential set of vector fields 172 from the motion estimator is illustrated in FIG. 4. It is noted that the illustrations and discussion related to FIGS. 4 and 5 is provided but no admission is made that in fact such material is xe2x80x9cprior artxe2x80x9d, unless explicitly stated as xe2x80x9cprior artxe2x80x9d. The vector fields of FIG. 4 point out of the reference frame and into each low resolution frame, in contrast to the technique shown in FIGS. 2 and 3. In particular, at each reference pixel there is one vector pointing to the corresponding scene location in each low resolution frame. Hence, not counting the identity vector pointing from a reference pixel to itself, there are Nxe2x88x921 outbound vectors at each reference pixel for an N length sequence of frames. In FIG. 4, one such group is shown on the reference lattice 171. The head of each numbered vector locates the position of the tail in the low resolution frame of the same number.
Since the high resolution lattice is defined by the coordinate system of the reference lattice, the vectors may be located directly on the high resolution lattice at those points which correspond to reference pixels, as shown in FIG. 5. FIG. 5 illustrates several interior vector groups with the remaining points having similar vector groups. When the motion estimator has delivered explicit vectors (as with optical flow) then the missing groups are interpolated from surrounding vectors by means of a multi-channel interpolation filter, such as a vector median filter. Alternatively, when a parametric motion estimator is used the derived motion models can be directly evaluated to obtain the missing vectors.
After the vector groups have been defined at all high resolution grid points the intensity reconstruction may be performed. Associated with each site in the grid is an intensity value derived from precisely one low resolution frame. Next the computation of the intensity at each high resolution site is performed. Each of the N motion vectors at each site (including the identity vector) points to a position in one low resolution frame. This defines N distances {∥vkxe2x88x92pk∥:kxcex5{1, . . . , N}} where vk is the (generally non-pixel) position in the kth frame, Fk, to which the associated motion vector points and pk is the existing low resolution pixel in Fk closest to vk. That is Pk:=arg min{∥vkxe2x88x92p∥:pxcex5Fk} where p denotes a lattice point in Fk. The norms (metrics), ∥xc2x7∥, in these expressions, are left unspecified and depend on the geometry of the sampling grids involved.
The frame, Fk, which has associated with it the smallest of the N distances becomes the frame from which the desired high resolution intensity is computed. An intensity based on the closet pixels is now associated with vk. This defines the intensity at the high resolution site to be reconstructed. Preferably the intensity at vk is that at pk. That is, the technique simply copies the intensity at pk into the high resolution site. It may be observed that this scheme is consistent with copying reference intensities into those high resolution sites that correspond to reference lattice points.
The present invention relates to a system for constructing a high resolution image from a plurality of low resolution images where each of the low resolution images includes a plurality of color planes. The system receives the plurality of low resolution images where the plurality of color planes of each of the low resolution images are spatially arranged in at least a partially non-coincident manner. Motion data representative of motion of at least a portion of the low resolution images is determined. The high resolution image is constructed, having a resolution greater than the low resolution images, based upon the non-coincident color planes of the low resolution images, together with the motion data.
The foregoing and other objectives, features and advantages of the invention will be more readily understood upon consideration of the following detailed description of the invention, taken in conjunction with the accompanying drawings.