In the classical supervised machine learning paradigm, training examples are represented by vectors of attributes, a teacher supplies labels for each training example, and a learning machine learns a decision rule using this data.
In actuality, however, the teacher can supply training data with some additional information which will not be available at the test stage. Consider, for example, an algorithm that learns a decision rule for prognosis of a disease in a year, given the current symptoms of a patient. In this example, additional information about symptoms in six months can be provided along with the training data that contains current symptoms and outcome in a year. This additional information about symptoms in six months may be helpful for predicting the outcome of the disease in a year.
Accordingly, a machine learning method that uses hidden information is needed.