1. Field of the Invention
The invention relates in general to use of object features to ascertain a perceptual distance between the objects.
2. Description of the Related Art
Perceptual distance means a quantitative-based measure of human perception on similarity between objects. Research in content-based image/video retrieval has steadily gained momentum in recent years as a result of the dramatic increase in the volume of digital images and videos. To achieve effective retrieval, an image/video system should be able to accurately characterize and quantify perceptual similarity. However, a fundamental issue—how to best measure perceptual similarity—remains a challenge. Various perceptual distance measurements, such as the Minkowski metric (See, M. W. Richardson, Multidimensional psychophysic, Psychological Bulletin, 35:659–660, 1938 (including the recently proposed fractional distance, C. C. Aggarwal, A. Hinneburg, and D. A. Keim, on the surprising behavior of distance metrics in high dimensional space, ICDT Conference Proceedings, 2001)), histogram Cosine distance (See, I. Witten, A. Moffat, and T. Bell, Managing Gigabytes: Compressing and Indexing Documents and Images, Van Nostrand Reinhold, New York, N.Y., 1994.), and fuzzy logic (See, J. Li, J. Z. Wang, and G. Wiederhold. Irm: Integrated region matching for image retrieval. Proceedings of ACM Multimedia, October 2000), have been used to measure similarity between feature vectors representing images (and hence video frames). A problem addressed by these distance measurement processes has been to accurately measure the degree to which objects are perceptually similar to each other. Conversely, the same problem can be characterized as accurately measuring the degree to which objects are perceptually different from each other.
The Minkowski metric is one example of a process that has been used in the past for measuring similarity between objects (e.g., images). Suppose two objects X and Y are represented by two p dimensional vectors (or expressions) (x1, x2, . . . , xp) and (y1, y2, . . . , yp), respectively. The Minkowski metric d(X, Y) is defined as:
                              d          ⁡                      (                          X              ,              Y                        )                          =                              (                                          ∑                                  i                  =                  1                                p                            ⁢                                                                                                            x                      i                                        -                                          y                      i                                                                                        r                                      )                                1            r                                              (        1        )            where r is the Minkowski factor for the norm. Particularly, when r is set as 2, it is the well known Euclidean distance; when r is 1, it is the Manhattan distance (or L1 distance); when r is set to less than 1, it is the fractional distance. An object located a smaller distance from a query object is deemed more similar to the query object. Measuring similarity by the Minkowski metric is based primarily on the assumption that similar objects should be similar to the query object in all features.
Parameter r can also help determine the separation between similar objects and dissimilar objects. In principle, in high dimensional spaces, r should be small (e.g., less than one). In low dimensional space, r can be large (e.g., 2 or 3).
In other words, parameter r is a scaling factor, which optimal value is dataset dependent, and can be learning from the way that we learn m.
We try different r values and pick the r that can achieve the maximum separation for similar and dissimilar objects.
A variant of the Minkowski function, the weighted Minkowski distance function, has also been applied to measure image similarity. The basic idea is to introduce weighting to identify important features. Assigning each feature a weighting coefficient wi (i=1 . . . p), the weighted Minkowski distance function is defined as:
                                              ⁢                                            d              w                        ⁡                          (                              X                ,                Y                            )                                =                                                    (                                                      ∑                                          i                      =                      1                                        p                                    ⁢                                                            w                      i                                        ⁢                                                                                                                                                x                            i                                                    -                                                      y                            i                                                                                                                      r                                                                      )                                            1                r                                      .                                              (        2        )            
By applying a static weighting vector for measuring similarity, the weighted Minkowski distance function assumes that similar images resemble the query images in the same features. For example, when the function weights color features high and ignores texture features, this same weighting is applied to all pair-wise distance computation with the query image.
The weighted Minkowski function, described by J. Rocchio, Relevance feedback in information retrieval, In G. Salton, editor, The SMART retrieval system: Experiments in automatic document processing, Prentice-Hall, 1971, and the quadratic-form distances described by M. Flickner, H. Sawhney, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker, Query by image and video content: The QBIC system, IEEE Computer, 28(9):23–32, 1995, and by, Y. Ishikawa, R. Subramanya, and C. Faloutsos, Mindreader: Querying databases through multiple examples, VLDB, 1998, are the two representative distance functions that match the spirit of Equation (3). The weights of the distance functions can be learned via techniques such as relevance feedback. See, K. Porkaew, S. Mehrota, and M. Ortega, Query reformulation for content based multimedia retrieval in mars, ICMCS, pages 747–751, 1999, and J. Rocchio, Supra, and discriminative analysis. See, X. S. Zhou and T. S. Huang, Comparing discriminating transformations and SVM for learning during multimedia retrieval, Pros. of ACM Conf. on Multimedia, pages 137–146, 2001. Given some similar and some dissimilar objects, the weights can be adjusted so that similar objects can be better distinguished from other objects.
An assumption made by these distance functions is that all similar objects are similar in the same respects. See, X. S. Zhou and T. S. Huang, Comparing discriminating transformations and SVM for learning during multimedia retrieval, Pros. of ACM Conf. on Multimedia, pages 137–146, 2001. Specifically, a Minkowski-like metric accounts for all feature channels when it is employed to measure similarity. However, there are a large number of counter-examples demonstrating that this assumption is questionable. For instance, the psychology studies of D. L. Medin, R. L. Goldstone, and D. Gentner, Respects for similarity, Psychological Review, 100(2):254–278, 1993, and A. Tversky, Feature of similarity, Psychological Review, 84:327–352, 1977, present examples showing that the Minkowski model appears to run counter to human perception of similarity.
Substantial work on similarity has been carried out by cognitive psychologists. The most influential work is perhaps that of Tversky, Id., who suggests that similarity is determined by matching features of compared objects, and integrating these features by the formula,S(A, B)=θf(A∩B)−αf(A−B)−βf(B−A)  (3)The similarity of A to B, S(A, B), is expressed as a linear combination of the common and distinct features. The term (A∩B) represents the common features of A and B. (A−B) represents the features that A has but B does not; (B−A) represents the features that B has but A does not. The terms θ, α, and β reflect the weights given to the common and distinctive components, and function f is often assumed to be additive, see, D. L. Medin, R. L. Goldstone, and D. Gentner, Supra.
Murphy and Medin, (See, G. Murphy and D. Medin, The role of theories in conceptual coherence, Psychological Review, 92:289–316, 1985), provide early insights into how similarity works in human perception: “The explanatory work is on the level of determining which attributes will be selected, with similarity being at least as much a consequence as a cause of a concept coherence.” Goldstone (See, R. L. Goldstone, Similarity, interactive activation, and mapping, Journal of Experimental Psychology: Learning, Memory, and Cognition, 20:3–28, 1994), explains that similarity is the process that determines the respects for measuring similarity. In other words, a distance function for measuring a pair of objects is formulated only after the objects are compared, not before the comparison is made. The relevant respects for the comparison are identified in this formulation process. The identified respects are more likely to be those that can support coherence between the compared objects. Although Goldstone had the right intuition, no one has been able to formulate a process that can measure similarity by selecting features in a partial and dynamic way.
Thus, there has been a recognized need for improvements in measurement of perceptual distance between objects. The present invention meets this need.