The exemplary embodiment relates to the information processing arts, information storage arts, classification arts, and related arts. It finds particular application in connection with the categorization of images using variable test-based measures, and will be described with particular reference thereto.
Digital objects, such as images, speech segments, text documents, and the like are commonly represented as digital files or digital representations, for example as bitmaps or grayscale or color pixel maps in the case of images, audio files in the case of speech segments, text or word processing files in the case of text documents, or hybrid files containing text and images. In the processing and/or storage of such objects, it is useful to categorize (classify according to categories) the objects automatically, with respect to one or more classes or categories. For example, pictorial images can be classified by subject matter, e.g., images of cats, images of dogs, images of vehicles, images of people, and the like.
To facilitate classification, a signature of the object is generated, which may be in the form of a vector having a relatively high dimensionality, i.e., which is sufficient to provide a unique signature for each object, but which incorporates substantially less data than the original object. Thus, for example, an image containing millions of pixels may be represented by a vector having perhaps 128-10,000 dimensions. For images, a suitable vector can be generated by computing features of selected image patches or sampling regions distributed across the image, and employing the computed features as elements of the feature vector or as inputs to a model which assigns a vector based thereon. For images, a Fisher vector or “bag-of-visual-words” vector representation can be used as a suitable vector representation. In the case of text documents, a “bag-of-words” vector representation is sometimes used, in which each vector element corresponds to a word and has a value indicative of a count of occurrences of that word in the text document.
The categorizer receives the vector representation of the object and outputs a classification based on the vector representation. Where there are multiple categories, this can be considered as a series of two class decision problems where each class is evaluated against the rest with a separate categorizer. The classification may be hard (e.g., “1” if the object is assigned to the category or “0” otherwise), or can be soft (e.g., the classification output is a value between 0 and 1 inclusive with higher values indicating a higher confidence of belonging to the category). A soft classification can be converted to a hard classification by thresholding the confidence level. Typically, the categorizer has adjustable parameters whose values are determined by training with a labeled training set. The objective of the training is to select the adjustable parameters such that the output of the categorizer substantially agrees with the classification labels assigned to the objects of the training set.
One problem which arises is that as the number of categories is increased, an image may be labeled with a large number of categories, each with an associated confidence that the image is assigned to that category. Providing a user with all this information may not be useful if a user is only interested in the most probable categories. However, there tends to be variability in the capabilities of the categorizer over the categories, and so establishing an arbitrary threshold confidence level may result in some categories being more prominent in the output than would be expected based on visual examination. In the case of image retrieval, failure to establish a threshold may result in the retrieval of a large number of tagged images for review, and the associated problems of data transmission and storage.
Incorporation by Reference
The following references, the disclosures of which are incorporated herein by reference in their entireties, are mentioned:
The following references disclose systems and methods for categorizing images based on content: U.S. Pat. No. 7,680,341, issued Mar. 16, 2010, entitled GENERIC VISUAL CLASSIFICATION WITH GRADIENT COMPONENTS-BASED DIMENSIONALITY ENHANCEMENT by Florent Perronnin; U.S. Pub. No. 2007/0005356, entitled GENERIC VISUAL CATEGORIZATION METHOD AND SYSTEM by Florent Perronnin; U.S. Pub. No. 2008/0069456 entitled BAGS OF VISUAL CONTEXT-DEPENDENT WORDS FOR GENERIC VISUAL CATEGORIZATION, by Florent Perronnin; U.S. Pub. No. 2009/0144033, published Jun. 4, 2009, entitled OBJECT COMPARISON, RETRIEVAL, AND CATEGORIZATION METHODS AND APPARATUSES, by Yan Liu, et al.; U.S. Pub. No. 2010/0098343, published Apr. 22, 2010, entitled MODELING IMAGES AS MIXTURES OF IMAGE MODELS, by Florent Perronnin, et al.; U.S. application Ser. No. 12/483,391, filed Jun. 12, 2009, entitled ACCURATE AND EFFICIENT CLASSIFICATION OF FISHER VECTORS, by Florent Perronnin, et al.; and U.S. application Ser. No. 12/541,636, filed Aug. 14, 2009, entitled TRAINING A CLASSIFIER BY DIMENSION-WISE EMBEDDING OF TRAINING DATA, by Florent Perronnin, et al. See also, Florent Perronnin, Christopher R. Dance, Gabriela Csurka, Marco Bressan: Adapted Vocabularies for Generic Visual Categorization. ECCV (4) pp. 464-475 (2006); Florent Perronnin, Christopher R. Dance: Fisher Kernels on Visual Vocabularies for Image Categorization. IEEE CVPR, (2007); Gabriela Csurka, Jutta Willamowski, Christopher R. Dance, Florent Perronnin: Incorporating Geometry Information with Weak Classifiers for Improved Generic Visual Categorization. ICIAP pp. 612-620 (2005).
U.S. application Ser. No. 12/820,647, filed Jun. 22, 2010, entitled PHOTOGRAPHY ASSISTANT AND METHOD FOR ASSISTING A USER IN PHOTOGRAPHING LANDMARKS AND SCENES, discloses one application of a visual classifier.