In recent years, Deep Neural Networks (DNN) have been used in a range of machine learning and data-mining applications. These networks comprise sequential signal processing units including, for example, either convolutional layers or fully connected layers, typically accompanied by pooling or regularization tasks:
A Convolutional Layer convolves, for example, an image “I” (in general nD) with a kernel “W” (in general n+1D) and adds a bias term “b” (in general nD) to it. The output is given by:P=I*W+b where * operator is (n+1)D convolution in general. Typically, n=3, but for time series applications, n could be 4. The convolution output P is then typically passed through an activation function. During training, the kernel and bias parameters are selected to optimize an error function of the network output.
A Fully Connected Layer is similar to classical Neural Network (NN) layers where all the neurons in a layer are connected to all the neurons in their subsequent layer. The neurons give the summation of their input multiplied by their weights and this is then passed through their activation functions.
Both convolution layers and fully-connected layers are especially useful in pattern recognition due to their nonlinear activation functions.
A Pooling Layer applies a (usually) non-linear transform (Note that “average pooling” is a linear transform, but the more popular “max-pooling” operation is non-linear) on an input image to reduce the size of the data representation after a previous operation. It is common to put a pooling layer between two consecutive convolutional layers. Reducing the spatial size leads to less computational load and also prevents the over-fitting as well as adding a certain amount of translation invariance to a problem.
Regularization prevents overfitting inside a network. One can train a more complex network (using more parameters) with regularization and prevent over-fitting while the same network would get over-fitted without regularization. Different kinds of regularizations have been proposed including: weight regularization, drop-out technique and batch normalization. Each of them has their own advantages and drawbacks which make each one more suitable for specific applications.
In ensemble classifiers, such as disclosed in A. Rahman and S. Tasnim, “Ensemble Classifiers and Their Applications: A Review”, International Journal of Computer Trends and Technology (IJCTT) 10(1):31-35, April 2014, different classifiers (or models) are trained for a given problem individually and placed in parallel to form a larger classifier. The results from all of the classifiers can thus be used to take a final decision.
Take for example, the problem of low quality iris image segmentation:
A first network which might be employed for this task is a 5 layer (including the output layer) fully convolutional neural network such as shown in FIG. 1(a). The first two layers have 8 channels and the second two layers have 16 channels. The kernel size increases for each layer starting with 3×3 for the first layer and 11×11 for the output layer. No pooling is used in this network, and batch normalization is used after each convolutional layer.
The second model is a reduced size SegNet basic model such as shown in FIG. 1(b), see V. Badrinarayanan, A. Kendall, and R. Cipolla, “SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation,” CoRR, vol. abs/1511.00561, 2015. This model is a fully convolutional architecture comprising 8 layers. Each layer has 10 channels and max pooling is used in the first four layers. The last four layers use the indices from their corresponding pooling layer in order to accomplish an un-pooling operation. A 7×7 kernel size is used in all layers and batch normalization is again used after each convolutional layer to avoid overfitting and provide faster convergence.
The third network designed for the problem in hand is a 6 layer fully convolutional network shown in FIG. 1(c). Each layer has 28 channels and a 3×3 kernel size is used in all layers. No pooling is used in the network and again batch normalization is used after each convolutional layer.
An extended CASIA 1000 dataset available from http://biometrics.idealtest.org can be used to train the three different models, with each having almost the same number of parameters.
So while each of these networks can provide results of limited usefulness, it will be seen that in order to deploy the three networks with a view to combining their outputs and so avail of their combined approaches, the number of resources required increases in proportion to the number of component networks.
“Going Deeper with Convolutions”, Christian Szegedy et al, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015; “Rethinking the Inception Architecture for Computer Vision” Christian Szegedy et al, Computer Vision and Pattern Recognition, arXiv:1512.00567, December 2015; and “Deep Learning with Separable Convolutions”, Francois Chollet, Computer Vision and Pattern Recognition arXiv:1610.02357, October 2016 discuss GoogleLeNet and Xception, deep neural networks based on the Google Inception architecture. These architectures aim to improve utilization of computing resources in a neural network by increasing the depth and width of the network.
Using this approach, the above three networks might be combined into a 22 layer convolutional network with minimally sized kernels as shown in FIG. 2 where each node represents a convolutional layer, pooling layer, fully connected layer etc.
However, such rationalization of the component networks can mean that beneficial characteristics of the component networks such as their large kernel size can be lost.
It is an object of the present invention to provide an improved method of synthesizing a neural network which better preserves the characteristics of the component networks while rationalizing the resources required for the synthesized network.