The exemplary embodiment relates to the information processing arts, information storage arts, document classification arts, and related arts. It finds particular application in connection with the selection and labeling of a training set for training a categorizer to categorize images or other digital objects and will be described with particular reference thereto.
Digital objects, such as images, speech segments, text documents, and the like are commonly represented as digital files or digital representations, for example as bitmaps or grayscale or color pixel maps in the case of images, audio files in the case of speech segments, text or word processing files in the case of text documents, or hybrid files containing text and images. In the processing and/or storage of such objects, it is useful to categorize (classify according to categories) the objects automatically, with respect to one or more classes or categories. For example, pictorial images can be classified by subject matter, e.g., images of cats, images of dogs, images of vehicles, images of people, and the like.
To facilitate classification, a signature of the object is generated, which may be in the form of a vector having a relatively high dimensionality, i.e., which is sufficient to provide a unique signature for each object, but which incorporates substantially less data than the original object. Thus, for example, an image containing millions of pixels may be represented by a vector having perhaps 128-10,000 dimensions. For images, a suitable vector can be generated by computing features of selected image patches or sampling regions distributed across the image, and employing the computed features as elements of the feature vector or as inputs to a model which assigns a vector based thereon. For images, a Fisher vector or “bag-of-visual-words” vector representation can be used as a suitable vector representation. In the case of text documents, a “bag-of-words” vector representation is sometimes used, in which each vector element corresponds to a word and has a value indicative of a count of occurrences of that word in the text document.
The categorizer receives the vector representation of the object and outputs a classification based on the vector representation. Where there are multiple categories, this can be considered as a series of two class decision problems where each class is evaluated against the rest with a separate categorizer. The classification may be hard (e.g., “1” if the object is assigned to the category or “0” otherwise), or can be soft (e.g., the classification output is a value between 0 and 1 inclusive with higher values indicating a higher confidence of belonging to the category). A soft classification can be converted to a hard classification by thresholding the confidence level. Typically, the categorizer has adjustable parameters whose values are determined by training with a labeled training set. The objective of the training is to select the adjustable parameters such that the output of the categorizer substantially agrees with the classification labels assigned to the objects of the training set.
Broad categorizers have been developed which have been trained to categorize digital objects such as images into a large number of pre-defined categories. Training of such classifiers takes a considerable amount of time and training data. However, even with a large number of trained classes, such classifiers may not always meet a specific user's needs. It is therefore desirable to create custom classifiers for specific users. The training of a custom categorizer, as with any classifier, is computationally intensive. Additionally, the performance of the trained custom categorizer may not be satisfactory, resulting in extensive retraining.
There remains a need for a method for evaluating a training set of labeled objects so that problems are detected prior to training of the custom categorizer.