A neural network is a powerful discriminative modeling tool. Neural networks can be used to solve problems of prediction, classification, structured recognition, and time series analysis, just to name a few. Neural networks are applicable in situations where a relationship between the predictor variables (inputs) and the predicted variables (outputs) exists, even when that relationship is complex and temporally varying.
One type of neural network is a deep neural network (DNN), which contains many layers that are built one upon the other. Each higher layer receives as input the output from the immediate lower layer. However, it is quite difficult to parallelize the prevailing training algorithms of such networks because of the challenge of spreading the large model out over multiple machines for each minibatch. This leads to a lack of scalability and parallelization in the learning algorithms for the DNN architecture.
In an effort to overcome these obstacles, a Deep Convex Network, or Deep Stacking Network (DSN) was developed. The DSN architecture is different from a normal neural network. It is a deep classification architecture that is built on blocks of single hidden layer neural networks (or SHLNN). The DSN architecture allows one to perform unsupervised learning on the bottom layers and supervised learning at the top layers. It is thus possible to develop more efficient, batch-mode, parallelizable learning algorithms for DSNs.
To further simplify the learning process, each building block of the DSN can have a non-linear hidden layer and a linear output (also called prediction) layer, rather than two non-linear layers. Because of this simplification, the connection between the hidden layer and the output layer can be determined using a closed-form solution and the connection between the input layer and the hidden layer can be estimated more effectively using a batch-mode algorithm that is easier to parallelize.