The following relates to the information processing arts, information storage arts, classification arts, and related arts. It finds particular application in connection with the development and use of a linear classifier which operates in a different space than labeled training sample vectors that are representative of images or other objects, and will be described with particular reference thereto.
Digital objects, such as images, speech segments, text documents, and the like are commonly represented as digital files or digital representations, for example as bitmaps or grayscale or color pixel maps in the case of images, audio files in the case of speech segments, or text or word processing files in the case of text documents. In the processing and/or storage of such objects, it is useful to classify the objects automatically, with respect to one or more classes. For example, pictorial images can be classified by subject matter, e.g., images of cats, images of dogs, images of vehicles, images of people.
To facilitate classification, a signature of the object is generated, which may be in the form of a vector having a relatively high dimensionality, i.e., which is sufficient to provide a unique signature for each object, but which incorporates substantially less data than the original object. Thus, for example, an image containing millions of pixels may be represented by a vector having perhaps 128-10,000 dimensions. For images, a suitable vector can be generated by computing features of selected image patches or sampling regions distributed across the image, and employing the computed features as elements of the feature vector or as inputs to a model which assigns a vector based thereon. For images, a Fisher vector or “bag-of-visual-words” vector representation can be used as a suitable vector representation. In the case of text documents, a “bag-of-words” vector representation is sometimes used, in which each vector element corresponds to a word and has a value indicative of a count of occurrences of that word in the text document.
The classifier receives the vector representation of the object and outputs a classification based on the vector representation. Where there are multiple classes, this can be considered as a series of two class decision problems where each class is evaluated against the rest. The classification may be hard (e.g., “1” if the object is assigned to the class or “0” otherwise), or can be soft (e.g., the classification output is a value between 0 and 1 inclusive with higher values indicating a higher likelihood of membership in the class). A soft classification can be converted to a hard classification by thresholding. Typically, the classifier has adjustable parameters whose values are determined by training with a labeled training set. The objective of the training is to select the adjustable parameters such that the output of the classifier substantially agrees with the classification labels assigned to the objects of the training set.
In general, classifiers may be linear or nonlinear. Linear classifiers, such as those using some form of logistic regression, are typically more computationally efficient than nonlinear classifiers, such as those employing kernel learning. However, nonlinear classifiers are typically more accurate than linear classifiers. When the dataset used in learning the classifier parameters is large, the training cost, in terms of the computation time for a non-linear classifier, can be significant. It would be advantageous to train a classifier which has an accuracy which approaches the accuracy that is typical of nonlinear classifiers, but which has a computational cost more typical of linear classifiers.