A classifier (also referred to as a categorizer) is often used in data mining applications to make a decision about cases. The decision is typically either a “yes” or “no” decision about whether a case belongs to a particular class (e.g. spam email or not), or a decision regarding which of plural classes (or categories) a case belongs to. Classifiers that are able to make decisions with respect to multiple classes are referred to as multiclass classifiers. Classifiers that make decisions regarding whether cases belong to a single class are referred to as binary classifiers
Classifiers make decisions by considering features associated with cases. These features may be Boolean values (e.g., whether the case has or does not have some property), numeric values (e.g., cost of a product or number of times a word occurs in a document), or some other type of feature. In one technique of feature identification, textual data in cases is decomposed into a “bag of words,” and each word seen in any string associated with a case becomes a feature, reflecting either the word's presence (Boolean) or its prevalence (numeric).
To build a classifier, the classifier is trained using training cases, where each training case includes a set of features and a label with respect to a particular class. The label of a training case indicates to which class the training case belongs. The label can be a binary label that has two values: positive or negative with respect to the particular class.