1. Field of the Invention
This invention relates generally to techniques for processing image data, and relates more particularly to a system and method for effectively performing local image similarity measurement.
2. Description of the Background Art
It is an important problem in image processing to measure local image similarity. Image similarity can be categorized into 3 classes: 1) Low level similarity. Local image patches are considered to be similar if some distance metric (e.g. p-norm, EarthMovers, Mahalanobis) is less than a given threshold; 2) Mid-level similarity. Here local image patches share some simple semantic property; and 3) High-level similarity. In this case, similarity is primarily defined by semantics. Properties that make two patches similar are not visual but they can be inferred from visual information such as a gesture. More detailed information may be found in “Learning Task-Specific Similarity, PhD Thesis,” by Greg Shakhnarovich, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 2005.
Demosaicing is a digital image process to reconstruct a full color image from the incomplete color samples output from an image sensor overlaid with a color filter array (CFA). In a CFA, only one color per pixel is measured. There are several different configurations of CFA. The most popular CFA is the Bayer pattern as described by B. E. Bayer in “Color Imaging Array”, U.S. Pat. No. 3,971,065, Jul. 20, 1976. It consists of three colors: red, green, and blue. Among all pixels, there are 25% red, 50% green, and 25% blue pixels. In order to improve color reproduction accuracy, T. Mizukura et al. proposed a four-color CFA in “Image pick-up device and image pick-up method adapted with image pick-up sensitivity”, U.S. Pat. No. 7,489,346, Feb. 10, 2009. Yoshihara et al. proposed to arrange the Bayer colors in a zigzag arrangement instead of a rectangular array, which improves fill factor and pixel sensitivity as described in “A 1/1.8-inch 6.4 MPixel 60 frames/s CMOS Image Sensor With Seamless Mode Change”, IEEE J. Solid-State Circuits, Vol. 41, No. 12, December 2006, pp. 2998-3006. Most sophisticated demosaicing algorithms utilize the strong correlation of high-frequency information among different color channels. They may copy high frequency information from one color channel to other color channels that are unknown at a given pixel location. To do this effectively, demosaicing algorithms need to infer local image structure by identifying local image patches that share similar local geometry.
Similar to demosaicing, denoising is another estimation problem relying on local image similarity measurement. The objective of image denoising is to estimate a noise-free pixel from digital images degraded by noise. The key to achieve a good estimate at a pixel location is to find a set of pixels that share similar local structures in the degraded images. Once the set of similar pixels are found, there are various ways to obtain denoised pixel value of the current pixel location including simple average, median, and other appropriate statistics. More sophisticated algorithms often utilize a weighted average of the pixels in the similar pixel set. The weights are able to be determined in many ways such as proximity, similarity, noise level or a combination thereof. For example, see F. Baqai, “System and method for denoising using signal dependent adaptive weights”, U.S. patent application Ser. No. 12/284,055, filed on Sep. 18, 2008.
It is interesting to note that demosaicing and denoising share a common process of finding similar local image structures in the presence of various degradations such as blur, distortions, and noise. In order to utilize local image similarity measurement more efficiently, some methods aim to do joint demosaicing and denoising by first estimating the basic structure and then iteratively fine tuning the result as described by A. Buades et al. in “Self-similarity driven color demosaicing”, IEEE TIP, Vol. 18, No. 6, June 2009, pp. 1192-1202 and K. Hirakawa and T. Parks in “Joint demosaicing and denoising”, IEEE TIP, Vol. 15, No. 8, August 2006, pp. 2146-2157.
One important part of local image similarity measurement is to select an appropriate similarity metric. There are many existing metrics for low-level image similarity in the literature. For instance, one quite popular metric is based on Euclidean distance (L2 norm) between pixels as described by C. Tomasi and R. Manduchi in “Bilateral Filtering for Gray and Color Images,” Proc. of IEEE International Conference on Computer Vision, pp. 841-846, 1998. However, this metric is very sensitive to lighting conditions and noise. More robust patch-based Euclidean distances have been proposed in “Self-similarity driven color demosaicing,” cited above. F. Baqai et al. proposed patch-based L1 distances to measure local image similarity in “A Method to Measure Local Image Similarity Based on the L1 Distance Measure”, U.S. patent application Ser. No. 12/567,454, filed on Sep. 25, 2009.
Another critical part of the local image similarity measure is the threshold at which a pixel or an image patch is considered to be similar. The selection of a threshold is based on various factors such an estimate of the degree of degradation in the image, similarity criterion, distance metric (L1, L2, and others), and patch size. If the threshold is too large, the similar measure may include some pixels that are not similar. If it is too small, the similar measure will not find a statistically significant number of similar pixels. An incorrectly selected threshold may cause several artifacts such as zipper effect, blur, and false colors in demosaicing. Similarly, denoising may not adequately remove noise (under smooth), or it may blur edges and texture (over smooth) if the threshold is not set at an appropriate level.
F. Baqai et al. utilize a relationship between distances measures to estimate appropriate thresholds in “A Method to Measure Local Image Similarity Based on the L1 Distance Measure”, cited above. Such a threshold can be represented as a product of the standard error sigma of local image intensities and a constant t that is not related to sigma and determined by some factors, but not limited to, such as patch sizes and pixel-similarity rates. Note that estimation of standard error sigma of local image intensities is generally not perfect. The stronger degradation an image has, the larger estimation error there is. If the estimated sigma is significantly larger than the true value, a lot of pixels that are not similar will be included in the similar pixel set. On the other hand, if the estimated sigma is significantly smaller than the true value, a lot of pixels that are similar will be excluded from the similar pixel set. Under both of above mentioned scenarios, the performance of local similarity measuring will be severely deteriorated. Therefore reducing the estimation error of standard error sigma of local image intensities is a key to further improve local similarity measurement under strong degradation of images.