The field of machine learning seeks to construct computer systems capable of adapting to and learning from their experiences. The field has spawned several different types of learning systems, one of which is the classifier. Classifiers typically are decision-making programs that take an input element and label it as a member of a particular class. For instance, a classifier trained to classify recipes by cuisine type would take an input recipe and label it according to what type of cuisine it represented.
Classifiers typically operate by storing a list of features, or descriptive attributes, which are deemed characteristic of that particular class. The features of an input are then compared to this list of features to determine how many features match and how close the matches are. An input can be deemed to fall into a particular class if a sufficient number of its features match the features of that class closely enough. Thus, in the example above, an input recipe may be classified as a particular type of cuisine if a sufficient number of its ingredients, cooking steps, or other features matches the classifier's features well enough. A classifier's features are often determined by a tedious process that involves manually constructing a training set of pre-labeled inputs. In essence, a number of inputs are selected, their features are manually highlighted, and they are labeled as belonging to a particular class or classes. Classifiers are then “trained” to recognize these features and classify new inputs accordingly.
The accuracy of a classifier depends in part on the number of features it is trained to recognize and the number of inputs in the training set it has to “learn” with. The greater the number of inputs and features in the training set, the better a classifier will be at recognizing features and classifying accordingly. Reliable classifiers thus require a substantial training set with a large number of manually-highlighted features and labels. As the number of inputs and features in a training set is typically large, the manual labeling/highlighting process is often time consuming and costly.
In view of the foregoing, it would be highly desirable to identify features for a classifier in a manner that does not require manual labeling or highlighting of features. Such an improvement could yield a significant savings in time and effort for classifier architects.