The generation of super-resolution images from a sequence of low-resolution images has recently received great interest, and has led to the emergence of several compelling algorithms. Nevertheless, most of the existing algorithms rely on relatively complex and difficult to implement models. Moreover, many of the existing algorithms do not always achieve satisfactory results.
Physical constraints of imaging sensors can significantly limit the image resolution and subsequent quality of a captured image. If the detector array is not sufficiently dense, this could result in aliased or under-sampled images. Much work has recently been done wherein a higher resolution and quality image is generated by considering multiple captured images of the same scene. This process is called super-resolution image generation. Super-resolution image generation is particular suited to images that are captured sequentially, e.g., by using a video camera, since there is high temporal and spatial correlation between successive frames.
Super-resolution image generation can have many different applications ranging from the generation of still images for printing purposes from a low resolution video, to the generation of High Definition Television (HDTV) from Standard Definition (SD) signals. Another application is in the area of security/surveillance systems and forensic sciences, especially after Sep. 11, 2001, where it is desirable to generate high resolution images from captured video sequences for the purpose of solving or even preventing crimes by, for example, identifying suspects. Other applications include aerial/satellite imaging, astronomy and even medical imaging. Although the generation of super-resolution images from still images often shown in movies and popular TV shows remains largely fiction, many of the recently proposed methods can produce rather impressive results.
Many of the previous techniques utilize Fourier domain methods where high frequency information is extracted from low frequency data in the low resolution images. Although such methods were relatively simple to implement, unfortunately they were also rather limited in terms of performance and applicability since they could not handle local and global motion. Instead, spatial domain techniques can produce considerably better results, although they are far more complicated since they require the consideration of motion registration and are in many cases iterative. Some of these methods have also been adapted to compressed video, thus making them more attractive in terms of applicability and usefulness. In general, most of these methods use deterministic methods, such as Projections Onto Convex Sets (POCS) to enhance resolution in the spatial domain without taking into account any source statistics, while others are based on statistical formulations such as maximum likelihood or maximum a-posteriori probability (MAP) estimates.
An application very similar to super-resolution generation is video de-noising. Video de-noising is a feature of many modern video encoding architectures since it can considerably enhance coding efficiency while, at the same time, improve objective and subjective quality. Digital still or video images can include noise due to the capturing or analog to digital conversion process, or even due to transmission reasons. Noise, nevertheless, apart from the visual displeasing impact it may have, can also have a severe adverse effect in many applications and especially video compression. Due to its random nature, noise can considerably decrease spatial and temporal correlation, thus limiting the coding efficiency of such noisy video signals. Thus, it is desirable to remove noise without removing any of the important details of the image, such as edges or texture.
Several video de-noising architectures have been proposed in which de-noising is performed by either considering spatial or temporal filtering methods, or a combination thereof. Even the most advanced spatial methods, such as Wiener or wavelet filtering, tend to be more appropriate for still images, while, due to their nature, temporal and spatio-temporal methods are more appropriate for video signals due to the temporal correlation that exists between adjacent pictures. Such methods are well-known in the art and can generally be classified into motion and non-motion compensated filters, which may or may not consider motion estimation and compensation techniques for filtering the current picture.
In a first prior approach, a spatio-temporal video de-noising architecture was presented combined with a video encoder compliant with the International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) Moving Picture Experts Group-4 (MPEG-4) Part 10 Advanced Video Coding (AVC) standard/International Telecommunication Union, Telecommunication Sector (ITU-T) H.264 recommendation (hereinafter the “MPEG-4 AVC standard”). In the first prior art approach, spatial filtering is performed on all pixels using a threshold based 3×3 pixel average, while the motion estimation process of the MPEG-4 AVC standard is reused for performing the temporal filtering. Considering that the MPEG-4 AVC standard allows the consideration and use of multiple references for predicting a block or macroblock, it is possible using this strategy to essentially generate several possible temporal predictions for the current pixel. These temporal predictions are then averaged together to form the final filtered picture. It should be noted that in this approach, motion estimation and compensation are performed on previously filtered pixels. Although this process could result in the generation of a more accurate motion field, this process could also result in some cases in the removal of some of the more refined details of a scene such as texture or edges.
An extension of this concept that has been proposed in a second prior art method involves performing more advanced motion compensation methods, through the consideration of wavelet filtering instead of the threshold based median. In the second prior art approach, a deblocking filter is introduced that is applied on the motion compensated residuals, therefore leading to fewer artifacts for the final de-noised video.