Image classification for object recognition is among the most important and well-known applications of convolutional neural networks. In such applications, a neural network is trained with a set of images that have been classified with labels. Once the network has learned this training data, it can then quickly analyze and classify arbitrary images. Training a network, however, can be time consuming and computationally complex.
A computational neural network contains a set of connected nodes organized in layers, where each layer provides output to a next layer based on input from a prior layer. The first layer is the input layer. The last layer is the output layer. Intermediate layers are called hidden layers. A neural network with more than one hidden layer is called a deep neural network. Typically, deep neural networks today have 5-150 hidden layers. Deep neural networks are able to learn more complex and abstract features of data (e.g., not only low level features such as edges and lines, but also higher level features such as shapes and arrangements of shapes). A common type of deep neural network is a convolutional neural network (CNN), where each layer generates a feature map. This feature map is convolved with filters to produce an output feature map. Conventionally, pooling is used to reduce the dimensionality of the feature maps.
Training of neural networks can use supervised learning or unsupervised learning. In unsupervised learning, the training data is not classified. The training clusters data, which corresponds to discovered features and classes. These classes, however, do not necessarily correspond to desired classifications. In supervised learning, the network is provided with classified training data. During training, the classified images are used to determine the weights in the network, e.g., using back propagation to adjust the weights to reduce the output error. Using back propagation to adjust weights through all levels of a network, however, can be computationally expensive, especially in the case of large training sets and convolutional networks having many layers. To make the training more manageable, the explosion of data to the next layer can be reduced with a ‘max pooling’ or ‘mean pooling’ technique. Even with these existing pooling techniques, however, back-propagation training of a CNN is still extremely complex and time-consuming.