The following relates generally to methods, apparatus and articles of manufacture therefor, for categorizing images.
Since the advent of digital image sensors, the number of collections of digital images continues to rise. Generic visual categorization provides access to high-level class information about objects contained in images for managing, searching, and mining such collections. Categorization of image content through generic visual categorization involves generalizing over natural variations in appearance inherent in a category of elements (e.g., objects, animals, etc.), and over viewing and imaging conditions. Unlike categorization methods for individual categories or object types, such as faces or cars, generic visual categorization systems handle multiple object types simultaneously.
One existing approach for performing generic visual categorization is an example-based machine learning approach known as the “bag of keypoints” approach, which makes use of a “visual vocabulary” to provide a mid-level characterization of images for bridging the semantic gap between low-level features and high-level concepts. The visual vocabulary is estimated in an unsupervised manner by clustering a set of training samples (i.e., low level features extracted from training images). To characterize an image, each of its feature vectors is assigned to its closest cluster and a single occupancy histogram is built. The image is classified by providing the single occupancy histogram to a set of Support Vector Machine (SVM) classifiers (i.e., one per class), trained in a one versus all manner.
Additional details and background of the bag of keypoints approach, and alternatives thereof, are disclosed in the following publications, which are incorporated herein by reference in their entirety: Csurka, Dance, Fan, Willamowski and Bray, “Visual Categorization With Bags-Of-Keypoints”, Proc. ECCV International Workshop on Statistical Learning in Computer Vision, 2004; and Farquhar, Szedmak and Meng, “Improving Bag-Of-Keypoints Image Categorization: Generative Models And P-Kernels”, LAVA Report, available on the Internet at www.ecs.soton.ac.uk, dated Feb. 17, 2005.
In accordance with the disclosure herein, there is provided an improved generic visual categorizer in which a vocabulary of visual words and an occupancy histogram are derived for each class. Each class vocabulary is derived by merging a general vocabulary and an adapted vocabulary for the class, which adapted vocabulary is adapted from the general vocabulary. The occupancy histogram developed for each class is employed to classify an image as one of being either more suitably described by the general vocabulary or as more suitably described by the adapted class vocabulary.
In accordance with the various embodiments disclosed herein, there is provided a method, apparatus and article of manufacture therefor, for assigning one of a plurality of classes to an input image by: identifying a plurality of key-patches in the input image; computing a feature vector for each of (at least ones of) the plurality of key-patches; computing a histogram for each of (at least ones of) the plurality of classes using the plurality of feature vectors computed; and assigning at least one of the plurality of classes to the input image using the plurality of histograms computed as input to a classifier.
In accordance other of the various embodiments disclosed herein, there is provided a method, apparatus and article of manufacture therefor, for training a classifier, that includes: identifying key-patches in images of a plurality of class training sets; computing feature vectors for the identified key-patches; computing a general vocabulary by clustering the computed feature vectors; for each of a plurality of classes, computing an adapted vocabulary using the general vocabulary; computing a histogram for each of the plurality of classes; training the classifier using the histograms for each of the plurality of classes.