When searching for relevant objects within a computer network, various techniques and algorithms have been used to locate relevant information, images or objects including techniques with sets of features, using kernels and support vector machines (SVMs) for recognition, and multi-resolution image representations.
Kernel-based learning algorithms, which include SVMs, kernel PCA (principal component analysis), and kernel LDA (linear discriminant analysis), have become well-established tools that are useful in a variety of contexts, including discriminative classification, regression, density estimation, and clustering. However, conventional kernels (such as the Gaussian RBF (radial basis function) or polynomial) are designed to operate on RN vector inputs, where each vector entry corresponds to a particular global attribute for that instance. As a result, initial approaches using SVMs for recognition were forced to rely on global image features, i.e. ordered features of equal length measured from the image as a whole, such as color or grayscale histograms or vectors of raw pixel data. Such global representations are known to be sensitive to real-world imaging conditions, such as occlusions, pose changes, or image noise.
More recently, it has been shown that local features invariant to common image transformations are a powerful representation for recognition, because the features can be reliably detected and matched across instances of the same object or scene under different viewpoints, poses or lighting conditions. Most approaches, however, perform recognition with local feature representations using nearest neighbor or voting-based classifiers followed by an alignment step. Both may be impractical for large training sets, since their classification times increase with the number of training examples. A support vector classifier or regressor, on the other hand, identifies a sparse subset of the training examples (the support vectors) to delineate a decision boundary or approximate function of interest.
In order to more fully leverage existing kernel-based learning tools for situations where the data cannot be naturally represented by a Euclidean vector space, such as graphs, strings, or trees, researchers have developed specialized similarity measures. Due to the increasing prevalence of data that is best represented by sets of local features, several researchers have recently designed kernel functions that can handle unordered sets as input. Nonetheless, current approaches are either prohibitively computationally expensive, make impractical assumptions regarding the parametric form of the features, discard information by replacing inputs with prototypical features, ignore semantically important co-occurrence information by considering features independently, are not positive-definite, and (or) are limited to sets of equal size. In addition, to our knowledge none have shown the ability to learn a real-valued function from sets of features; results have only been shown for classification tasks.
Approaches which fit a parametric model to feature sets in order to compare their distributions can be computationally costly and have limited applicability, since they assume both that features within a set will conform to the chosen distribution, and that sets will be adequately large enough to extract an accurate estimate of the distribution's parameters. These assumptions are violated regularly by real data, which will often exhibit complex variations within a single bag of features (e.g., patches from an image), and will produce wide ranges of cardinalities per instance (e.g., titles of documents have just a few word features).
Kernel methods which use explicit correspondences between two sets' features search one set for the best matching feature for each member in the other, and then define set similarity as a function over those component similarity values. These methods have complexities that are quadratic in the number of features, hindering usage for kernel-based learning when feature sets are large. Furthermore, matching each input feature independently ignores useful information about intra-set dependencies. In one known method, similarity is measured in terms of the principal angle between the linear subspaces spanned by two sets' vector elements. The kernel has a cubic complexity and is only positive-definite for sets of equal cardinality. In another known method, an algebraic kernel is used to combine similarities given by local (vector-based) kernels, with the weighting chosen to reflect whether the features are in alignment (ordered). When set cardinalities vary, inputs must be padded with zeros so as to form equal-size matrices; results are only shown for a classification task with input sets whose features' ordering is known.
An alternative approach to discriminative classification when dealing with unordered set data is to designate prototypical examples from each class, and then represent examples by a vector giving their distances to each prototype. Standard algorithms that handle vectors in a Euclidean space are then applicable. One technique is to build such a classifier for handwritten digits, and use a shape context distance as the measure of similarity. The issues faced by such a prototype-based method are determining which examples should serve as prototypes, choosing how many there should be, and updating the prototypes properly when new types of data are encountered. Another method uses a hybrid generative-discriminative approach for object recognition, combining a Fisher kernel and a probabilistic constellation model.