Embedding data into a lower dimensional space or the related task of clustering data, are unsupervised dimensionality reduction techniques that have been intensively studied. Most algorithms relating to same are developed with the motivation of producing a useful analysis and visualization tool.
Recently in the field of semi-supervised learning, the task of improving generalization on a supervised task using unlabeled data, has made use of many of the above-mentioned techniques. For example, nonlinear embedding or cluster representations have been used as features for a supervised classifier, with improved results.
Most of these architectures are disjoint and shallow (four layers or less), by which we mean the unsupervised dimensionality reduction algorithm is trained on unlabeled data separately as a first step, and then its results are fed to a supervised classifier, which has a shallow architecture such as a (kernelized) linear model.
Typically, the quantity of labeled data is insufficient to perform hard artificial intelligence (AI) tasks, such as scene or language understanding, well. In addition, the sharing of information learnt across sub-tasks (multi-task learning) seems a more economical use of data, but necessitates a deep learning network architecture, where presumably all tasks are learnt jointly. A typical example is a multi-layer feed-forward network where the first layers will typically learn levels of feature extraction or processing that are useful to all tasks.
The aim is that the unsupervised method will improve accuracy on the task at hand. However, existing unsupervised methods for deep learning network architectures are somewhat complicated and restricted.
Accordingly, there remains a need for an improved method for training a deep multi-layered learning network, with labeled and unlabeled training data.