The following relates to the information processing arts, information storage arts, classification arts, and related arts.
Objects such as images, speech segments, text documents, or the like are commonly represented as digital files or digital representations, for example as bitmaps or grayscale or color pixel maps in the case of images, audio files in the case of speech segments, text or word processing files in the case of text documents, or so forth. In the processing and/or storage of such objects, it is useful to classify the objects respective to one or more classes. For example, images can be classified by subject matter, e.g. images of cats, images of dogs, images of vehicles, images of people, or so forth.
To facilitate classification, a vector representation of an object may be generated. For images, a suitable vector can be generated by computing features at selected image patches or sampling regions distributed across the image, and employing the computed features as elements of the feature vector. For images, a Fisher vector or “bag-of-visual-words” vector representation can be used as a suitable vector representation. In the case of text documents, a “bag-of-words” vector representation is sometimes used, in which each vector element corresponds to a word and has a value indicative of a count of occurrences of that word in the text document.
The classifier receives the vector representation of the object and outputs a classification based on the vector representation. The classification may be hard (e.g., “1” if the object is assigned to the class or “0” otherwise), or can be soft (e.g., the classification output is a value between 0 and 1 inclusive with higher values indicating a higher likelihood of membership in the class). A soft classification can be converted to a hard classification by thresholding. Typically, the classifier has adjustable parameters whose values are determined by training respective to a labeled training set. The objective of the training is to select the adjustable parameters such that the output of the classifier substantially agrees with the classification labels assigned to the objects of the training set.
In general, classifiers may be linear or nonlinear. Linear classifiers are typically computationally efficient as compared with nonlinear classifiers. On the other hand, nonlinear classifiers are typically more accurate than linear classifiers. It would be advantageous to construct a nonlinear classifier that retains the accuracy typical of nonlinear classifiers, but which has efficiency typical of linear classifiers at runtime.