1. Field of the Invention
The invention relates generally to digital image processing and, more particularly, to a method of using a sequence of low resolution that are subject to random jitter and contain undersampled features or scenes to produce one higher quality, higher resolution as a still image or to produce a sequence of such higher resolution frames as a video image.
2. Discussion of Related Art
Images may be defined in terms of both xe2x80x9cresolutionxe2x80x9d and xe2x80x9cdetailxe2x80x9d with relevance to this invention. xe2x80x9cResolutionxe2x80x9d is a term that has taken on two different meanings in the image processing community. Strictly speaking, resolution refers to the number of pixels packed into a certain length or area. This application, however, uses resolution in the more general and perhaps more common sense, i.e. to refer to the sheer number of pixels in an image. Accordingly, one might regard a low resolution frame as a low pixel count frame (LPC frame) and a high resolution frame as a high pixel count frame (HPC frame).
xe2x80x9cDetailxe2x80x9d generally refers to the fine image elements that can be distinctly perceived within an image. One image might distinctly display a series of tightly spaced lines. Another image might xe2x80x9cblurxe2x80x9d those lines into a patch of gray fuzz. The first image has more detail than the second, i.e. it has higher xe2x80x9cspatial frequency.xe2x80x9d The image resolution puts an upper cap on the amount of fine detail that can be displayed, but the bottleneck on detail is often unrelated to resolution.
Image data often does not contain as much fine detail as desired for an intended use. Numerous examples exist with respect to both video and still images. A news video of a car chase may be smeared and jittery because of the long distance between the airborne camera and the subject, and because of the instability of the helicopter that carries the camera. An image derived from a security video of a crime scene may be so coarse and so smeared that identification of the suspect is difficult, if not impossible. An x-ray image at an airport check-in facility may be so coarse that the operator misses the wires in an explosive triggering device. An image from a video camera at a law enforcement site or toll road entry may be too coarse to recognize the alphanumeric digits on a license plate. An infrared image from a military reconnaissance video may be too coarse and jittery to identify vehicles on the ground.
Light emitted or reflected from a subject is an xe2x80x9canalogxe2x80x9d phenomena since the available range of colors and intensities is a smooth, continuous function. An imaging system (such as a camera) uses some sort of optical system to gather some of the light diverging from the subject and then form an image of the subject on an image sensor. The spatial details within the optical image are limited only by optical aberrations inserted by the optical imaging elements and by the finite wavelengths of the light involved. The optical image, therefore, is generally not the limiting factor in terms of reproducing spatial detail. A loss of detail generally occurs where an image sensor is interfaced to the optical image, i.e. at the point of converting the optical image into an analog or digital electronic image signal.
An exemplary and direct-to-digital image sensor is a CCD chip having an two-dimensional array of electrodes or, more generally speaking, sensor elements. Ideally speaking, the sensor elements would be as small as possible in order to capture all of the available detail provided by the optical image and packed together as closely as possible to capture the image with as much efficiency as possible.
A real-world sensor, however, has sensor elements of measurable size and spacing that tend to cause certain imaging limitations. Simply put, the typical image sensor produces a low resolution frame (i.e. low pixel count frame) because its sensor elements are not infinitely small and not infinitely dense.
In many of the examples cited above, therefore, the high spatial frequencies (fine details) in the optical image are presented to relatively large sensor elements. As such, the image sensor is unable to capture all of the available detail in a single LPC frame, and until now, no practical video processing was available to analyze two or more LPC frames in order to recover such detail and transfer such detail to one or more high pixel count frames (HPC frames).
There have been previous attempts to increase the stability of video sequences, but these have not exploited the jitter in the original scene to create higher resolution. Commercial video cameras available to consumers have crude electronic stabilization capability, but do not significantly increase the resolution of the video produced. Video editing systems may electronically xe2x80x9czoomxe2x80x9d or xe2x80x9cupsamplexe2x80x9d the frames, and they may stabilize the zoomed frames based on prior frames, but they do not use image data from the prior frames to improve the spatial details in the zoomed frame.
The post-production electronic zoom processes known to these inventors either magnify the pixels without upsampling the xe2x80x9czoomedxe2x80x9d video at all, or they apply an ordinary interpolation algorithm to each independent LPC frame to produce a corresponding HPC frame of generally lesser quality. The first example results in a blocky image, sometimes referred to as xe2x80x9cpixelationxe2x80x9d or xe2x80x9cthe jaggiesxe2x80x9d. The second method results in an excessively smoothed or blurred image lacking the spatial details and edge definition that was present in the optical image that was impinging on the image sensor during the formation of the LPC frame.
There are xe2x80x9cup-convertersxe2x80x9d for HDTV (High Definition Television) that receive standard, lower resolution television sequences and produce higher resolution sequences. The inventors are unaware of any public literature regarding the operation of these up-converters. It is believed, however, that these up-converters create HPC frames on a frame-by-frame basis, interpolating the high resolution data in each HPC frame from the lower resolution data of a single LPC frame. Edges may be estimated and inserted to make the HPC frames appear of higher quality, but it does not appear that the edges are developed with image data hidden in a sequence of LPC frames.
The preceding consumer products try to improve jittery, low pixel count video, but they do not exploit spatial detail that is hidden within a jittery succession of LPC frames. However, various military projects have both stabilized and increased the spatial frequency of video sequences based upon the jitter of the source video. One approach developed by the Air Force is generally described in xe2x80x9cHigh-resolution Image Reconstruction From a Sequence Rotated and Translated Frames and its Application to an Infrared Imagexe2x80x9d by Russell C. Hardie, et al. in Opt. Eng., 37(1), 1998. The Air Force technique derives frame-to-frame motion from a complex series of multiple trial and error registrations. In essence, each new video frame is moved multiple times relative to an underlying xe2x80x9cstackxe2x80x9d of already aligned frames in order to finally ascertain the amount of frame-to-frame motion or jitter that was imparted to the new frame. Accordingly, detail from successive LPC frames may be combined, but only by trying to precisely register each new frame by using multiple iterations of full frame shift and compare operations that are time consuming and processor intensive. This trial and error approach obviously has limited, real-time uses.
There remains a need, therefore, for a method of image enhancement that addresses the above issues and provides a higher quality, higher pixel count frame without requiring multiple iterations of frame registration algorithms. In the context of still images, there is a need for images having a higher resolution than that provided by single frame capture from native camera video and containing more of the spatial detail that was present in the originally imaged scene, but not captured by any one LPC frame. In the context of moving images, there is a need for higher quality and higher pixel count video sequence than that provided directly by the camera and for a video sequence that is relatively free of jitter, or for both.
Accordingly, several objects and advantages of our invention are:
1. To stabilize video and simultaneously allow for electronic zoom or upsampling at a higher pixel count;
2. To increase the quality and pixel count of frames in a video sequence using a simple and computationally efficient algorithm that operates in real-time;
3. To stabilize the frames in a video sequence using a simple and computationally efficient algorithm that operates in real-time;
4. To perform electronic zoom without pixelation while recovering detail that was contained in the original optical image, but not present in any single frame; and
5. To upsample portions of standard TV sequences to HDTV while recovering detail that was contained in the original optical image, but not present in any single frame.