1. Technical Field
The present invention relates to supervised machine learning, and more particularly to system and method for supervised machine learning of binary classification models in the presence of noise.
2. Discussion of Related Art
In the field of binary classification (classifying the members of a given set of objects into two groups on the basis of whether they have some property or not), formulations of supervised learning seek a predictor that maps input x to output y. The predictor is constructed from a set of training examples {(xi,yi)}. A hidden underlying assumption is that errors are confined to the output y. That is, the input data are not corrupted with noise; or even when noise is present in the data, its effect is ignored in the learning formulation.
However, for many applications, this assumption is unrealistic. Sampling errors, human errors, modeling errors and instrument errors may preclude the possibility of knowing the data matrix X exactly, where X=[x1 . . . xi]T consists of all training points xi as its rows. Hence the observed input xi is not accurate.
For example, consider the problem of classifying sentences from speech recognition output for call-routing applications. A speech recognition system may make errors so that the observed text is corrupted with noise. Speech recognition systems can provide an estimate of the confidence for its output, which measures how uncertain each element of its output is. This confidence information is typically ignored in learning formulations.
Therefore, a need exists for a system and method for supervised machine learning of binary classification models that models an underlying input uncertainty.