Neural networks pertain to computational approaches that are loosely modeled after the neural structures of biological brain processing that can be used for solving complex computational problems. Neural networks are normally organized as a set of layers, where each layer includes a set of interconnected nodes that include various functions. Weighted connections implement functions that are processed within the network to perform various analytical operations. Learning processes may be employed to construct and modify the networks and the associated weights for the connectors within the network. By modifying the connector weights, this permits the network to learn over time from past analysis to improve future analysis results.
Neural networks may be employed to perform any appropriate type of data analysis, but is particularly suitable to be applied to complex analysis tasks such as pattern analysis and classification. Direct application of these techniques are therefore suitable, for example, to implement machine vision functions such as recognition and classification of specific objects and object classes from image data captured by digital imaging devices.
There are numerous types of neural networks that are known in the art. Deep neural networks is a type of neural network where deep learning techniques are applied to implement a cascade of many layers of nonlinear processing to perform analytical functions. Deep learning algorithms transform their inputs through more layers than shallow learning algorithms. At each layer, the signal is transformed by a processing unit, such as an artificial neuron, whose parameters are learned through training.
A convolutional neural network is a type of neural network where the connectivity pattern in the network is inspired by biological visual cortex functioning. Visual fields are constructed through the network, where the response of an individual artificial neuron to an input stimulus can be approximated mathematically by a convolution operation.
Convolutional deep neural networks have been implemented in the known art. LeNet (LeCun et al. (1998), AlexNet (Krizhevsky et al. (2012), GoogLeNet (Szegedy et al. (2015), and VGGNet (Simonyan & Zisserman (2015) are all examples of ConvNet architectures that implement different types of deep neural networks. These models are quite different (e.g., different depth, width, and activation functions). However, these models are all the same in one key respect—each one is a hand designed structure which embodies the architects' insights about the problem at hand.
These networks follow a relatively straightforward recipe, starting with a convolutional layer that learns low-level features resembling Gabor filters or some representations thereof. The later layers encode higher-level features such as object parts (parts of faces, cars, and so on). Finally, at the top, there is a layer that returns a probability distribution over classes. While there approach provide some structure, in the label space, to the output that is produced by a trained network, the issue is that this structure is seldom utilized when these networks are designed and trained.
Structure learning in probabilistic graphical models have been suggested, where the conventional algorithms for structure learning in deep convolutional networks typically fall into one of two categories: those that make the nets smaller, and those that make the nets better. One suggested approach focuses on taking unwieldy pretrained networks and squeezing them into networks with a smaller memory footprint, thus requiring fewer computational resources. This class of techniques follows the “teacher-student” paradigm where the goals is to create a student network which mimics the teacher. This means that one needs to start with both an Oracle architecture and its learned weights—training the student only happens later. When distilling an ensemble of specialists on very large datasets, the computationally expensive ensemble training step must be performed first.
Feng et al, “Learning the Structure of Deep Convolutional Networks” is an example of a technique for automatically learning aspects of the structure of a deep model. This approach uses an Indian Buffet Process to propose a new convolutional neural network model to identify a structure, where after the structure is determined, pruning is performed to create a more compact representation of the network. However, one drawback with this approach is that the number of layers remain static, where it is only the known individual layers within the static number of layers that is augmented to be more or less complex through the structure learning process. As such, this approach is unable to identify any new layers that may be needed to optimize the structure.
Therefore, there is a need for an improved approach to implement structure learning for convolutional neural networks.