A classifier (also referred to as a categorizer) is often used in data mining applications to make a decision about cases. The decision is typically either a “yes” or “no” decision about whether a case has a particular property, or a decision regarding which of plural classes (or categories) a case belongs to. Classifiers that are able to make decisions with respect to multiple classes are referred to as multiclass classifiers. Classifiers that make decisions regarding whether cases belong to a single class are referred to as binary classifiers
Classifiers make decisions by considering features associated with cases. These features may be Boolean values (whether the case has or does not have some property), numeric values (e.g., cost of a product or number of times a word occurs in a document), or some other type of feature. In one technique of feature identification, textual data in cases is decomposed into a “bag of words,” and each word seen in any string associated with a case becomes a feature, reflecting either the word's presence (Boolean) or its prevalence (numeric).
In general, especially when dealing with textual data, there can be a very large number of features available. Feature selection is thus conventionally used to narrow the set of features used to build (train) the classifier. Feature selection algorithms typically look at how well individual candidate features (or groups of candidate features) perform in classifying a training set of cases that have been labeled as to what the answer should be.
A conventional technique of building a multiclass classifier is to train multiple binary classifiers on training sets for each of the classes. The multiple individually trained binary classifiers are then combined to form the multiclass classifier. However, conventional techniques of building multiclass classifiers usually ignore many sources of information (possible features) that may be helpful to build more accurate classifiers.