Several methods for improving discriminative classifiers using unlabeled data have been developed in the last few years. Perhaps the two most popular ways of utilizing the unlabeled data are: maximizing the margin on the unlabeled data as in Transductive Support Vector Machines (TSVM) so that the decision rule lies in a region of low density; and learning the cluster or manifold structure from the unlabeled data as in cluster kernels, label propagation, and Laplacian SVMs. Both approaches can be seen as making the same structure assumption on the data, that the cluster or manifold structure in the data is correlated with the class labels of interest.
The Low Density Separation algorithm (LDS) is a two-stage algorithm that combines both of these approaches, with improved results over using only one of the techniques, however the combination method is somewhat ad-hoc.
One problem with these methods is that they each suffer from an inability to scale to very large datasets, apart from in the linear case. This is ironic because the potential gain of semi-supervised learning lies in the vast amounts of readily available unlabeled data. This performance gain is never attained simply because of the computational burden of calculating the result.
Accordingly, a new learning method which uses unlabeled data is needed which overcomes the problems associated with existing methods.