Machine learning is a scientific discipline directed toward the design and development of algorithms that allow computers to improve judgment. For example, a system can take advantage of examples (data) to capture characteristics of interest. Data can be seen as examples that illustrate relations between observed variables. A major focus of machine learning research is to automatically learn to recognize complex patterns and make intelligent decisions based on data; the difficulty lies in the fact that the set of all possible behaviors given all possible inputs is too large to be covered by the set of observed examples (training data). Hence the learner must generalize from the given examples, so as to be able to produce a useful output in new cases.
Statistical classification utilizes a training set of data containing known observations for a sub-population to identify the sub-population to which new observations belong. Executable instructions in the form of a classifier perform these operations. New individual items are placed into groups by the classifier based upon quantitative information on one or more measurements, traits or characteristics established by the training set.
In contrast, cluster analysis evaluates a single data set to decide how and whether the observations in the data set can be divided into groups. Clustering is known as unsupervised learning, while classification is known as supervised learning.
A problem arises when a previously unused label is introduced. In this case, there is no a priori set of labeled examples. Training examples for supervised learning are necessary. Consequently, there is a need to minimize the inconvenience of selecting training examples, while maximizing the benefit of the selected training examples.