The present invention relates to computer vision. More specifically it relates to the use of histogram-based local descriptors (HBLD) in various computer vision tasks.
Histogram-based local descriptors (HBLD) are used in various computer vision tasks. These tasks include shape matching. See, for example, S. Belongie, J. Malik and J. Puzicha. “Shape Matching and Object Recognition Using Shape Context”, IEEE Trans. on PAMI, 24(24):509-522, 2002; K. Grauman and T. Darrell, “Fast Contour Matching Using Approximate Earth Mover's Distance”, CVPR, 1:220-227, 2004; H. Ling and D. W. Jacobs, “Using the Inner-Distance for classification of Articulated Shapes”, CVPR, II:719-726, 2005; and A. Thayananthan, B. Stenger, P. H. S. Torr and R. Cipolla, “Shape Context and Chamfer Matching in Cluttered Scenes”, CVPR, I:1063-6919, 2003. The tasks also include image retrieval. See, for example: D. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” IJCV, 60(2), pp. 91-110,2004; K. Mikolajczyk and C. Schmid, “A Performance Evaluation of Local Descriptors,” IEEE Trans. on PAMI, 27(10):1615-1630, 2005; and E. N. Mortensen, H. Deng, and L. Shapiro, “A SIFT Description with Global Context,” CVPR, I:184-190, 2005. The tasks further include texture analysis. See, for example, S. Lazebnik, C. Schmid, and J. Ponce, “A sparse texture representation using affine-invariant regions,” IEEE Trans. PAMI, 27(8):1265-1278, 2005.
HBLDs are very effective for these tasks because distributions capture rich information in local regions of objects. However, in practice, HBLDs often suffer from distortion problems due to deformation, illumination change and noise, as well as the quantization effect. See, Y. Rubner, C. Tomasi, and L. J. Guibas. “The Earth Mover's Distance as a Metric for Image Retrieval”, IJCV, 40(2):99-121, 2000.
Comparison measures between histograms show similarities and dissimilarities between histograms. The measures can be categorized into bin-to-bin and cross-bin distances.
The most often used bin-to-bin distances between HBLDs (e.g. χ2 statistics, L2 distance and Kullback-Leibler divergence) assume that the histograms are already aligned, so that a bin in one histogram is only compared to the corresponding bin in the other histogram. These methods are sensitive to the distortion in HBLDs as well as quantization effects. For example in FIG. 1, they falsely state that 102 is closer to 103 than to 101 wherein 104, 105 and 106 show corresponding histograms using the same 2D bins. Cross-bin distances, such as the Earth Mover's Distance (EMD) as described in, Y. Rubner, C. Tomasi, and L. J. Guibas. “The Earth Mover's Distance as a Metric for Image Retrieval”, IJCV, 40(2):99-121, 2000, allow bins at different locations to be (partially) matched and therefore alleviate the quantization effect. However, most of the cross-bin distances are only efficient for one dimensional histograms which unfortunately limits their application to the multi-dimensional HBLDs such as shape context as described in S. Belongie, J. Maliic and J. Puzicha. “Shape Matching and Object Recognition Using Shape Context”, IEEE Trans. on PAMI, 24(24):509-522, 2002, and SIFT as described in D. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” IJCV, 60(2), pp. 91-110, 2004, etc.
The approach of the present invention falls into the latter category, the cross-bin distances. In the following, cross-bin distances which are most related to the present invention are discussed.
The Earth Mover's Distance (EMD) proposed by Rubner et al. in Y. Rubner, C. Tomasi, and L. J. Guibas. “The Earth Mover's Distance as a Metric for Image Retrieval”, IJCV, 40(2):99-121, 2000, defines the distance computation between distributions as a transportation problem. EMD is very effective for distribution with sparse structures, e.g., color histograms in the CIE-Lab space as in Y. Rubner, C. Tomasi, and L. J. Guibas. “The Earth Mover's Distance as a Metric for Image Retrieval”, IJCV, 40(2):99-121, 2000. However, the time complexity of EMD is larger than O(N3) where N is the number of histogram bins. This prevents its application to multi-dimensional histogram-based descriptors such as the HBLDs.
Indyk and Thaper in “Fast Image Retrieval via Embeddings”, In 3rd Workshop on Statistical and computational Theories of Vision, Nice, France, 2003, proposed a fast EMD algorithm by embedding the EMD metric into a Euclidean space. The embedding is performed using a hierarchical distribution analysis. EMD can be approximated by measuring the L1 distance in the Euclidean space after embedding. The time complexity of the embedding is O(Nd log Δ), where N is the size of feature sets, d is the dimension of the feature space and Δ is the diameter of the union of the two feature sets to be compared. The embedding approach is effectively applied to retrieval tasks as described in P. Indyk and N. Thaper, “Fast Image Retrieval via Embeddings”, In 3rd Workshop on Statistical and computational Theories of Vision, Nice, France, 2003, and shape comparison as described in K. Grauman and T. Darrell, “Fast Contour Matching Using Approximate Earth Mover's Distance”, CVPR, 1:220-227, 2004.
Most recently, Grauman and Darrell proposed in “The Pyramid Match Kernel; Discriminative classification with Sets of Image Features” ICCV, 2005, using the pyramid matching kernel for feature set matching. In this article, a pyramid of histograms of a feature set is extracted as a description of an object. Then the similarity between two objects is defined by a weighted sum of histogram intersections at each scale as described in M. J. Swain and D. H. Ballard. “Color Indexing”, IJCV, 7(1):11-32, 1991. The diffusion process has widely been used for the purpose of data smoothing and scale-space analysis in the computer vision community. Some earlier work introducing this idea can be found in A. P. Witkin. “Scale-space filtering”, IJCAI, pp. 1019-1022, 1983; and in J. J. Koenderink, “The structure of images”, Biol. Cybern., 50:363-370, 1984.
These works axiomatically demonstrated that a PDE model of the linear heat dissipation or diffusion process has a unique solution of Gaussian convolution. More recent well-known diffusion-based methods include anisotropic diffusion for edge-preserving data smoothing as described in P. Perona and J. Malik. “Scale-Space and Edge Detection Using Anisotropic Diffusion”. IEEE Trans. on PAMI, 12(7):629-639, 1990; and automatic scale selection with γ-normalized Laplacian as described in T. Lindeberg, “Feature Detection with Automatic Scale Selection”, IJCV, 30(2):79-116, 1998. It also provides a theoretical foundation to other vision techniques such as Gaussian pyramids and SIFT feature detector as described in D. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” IJCV, 60(2), pp. 91-110, 2004. Despite its ubiquitousness, to the best of our knowledge, this is the first attempt to exploit the diffusion process to compute a histogram distance.
Other histogram dissimilarity measures and an evaluation can be found in Y. Rubner, J. Puzicha, C. Tomasi, and J. M. Buhmann. “Empirical Evaluation of Dissimilarity Measures for Color and Texture”, CVIU, 84:25-43, 2001. In this article, the authors also describe two other cross-bin distances: early work by Peleg et al., “A Unified Approach to the Change of Resolution: Space and Gray-level”, IEEE Trans. on PAMI, 11:739-742, 1989; and a heuristic approach, quadratic form as described in W. Niblack, R. Barber, W. Equitz, M. Flickner, B. Glasman, D. Pektovic, P. Yanker, C. Faloutsos, and G. Taubin. ‘The QBIC project: querying images by content using color, texture and shape”. In Proc. of SPIE Storage and Retrieval for Image and Video Databases, pp. 173-187, 1993, and in J. Hafner, H. S. Sawhney, W. Equitz, M. Flickner, and W. Niblack. “Efficient color histogram indexing for quadratic form distance functions”, IEEE Trans. on PAMI, 17(7):729-736, 1995.
Accordingly, a new and improved method and system for comparing histograms is needed.