Current state-of-the-art neural networks may be utilized to identify patterns in large quantities of data (e.g., images, video, audio, etc.). For example, deep learning neural networks may be used for image processing. Deep learning neural networks may also be used in different aspects of data science, such as machine translation, speech recognition, facial recognition, etc.
Convolutional neural networks (CNNs) are deep learning neural networks that provide translational independence via a convolutions of a filter over an input. Assume an input occupying a three-dimensional (3D) space with three dimensions: x-dimension, y-dimension, and z-dimension. Caffe is an example library for CNNs. In Caffe, traditional 3D convolution is implemented utilizing traditional 3D filters that span the depth of the input along the z-dimension but cover only a small section of the input along the x-dimension and the y-dimension. Each slice of the input along the z-dimension may represent a different homogenous feature with a unique semantic definition (i.e., meaning). The 3D filters are slidable across the input along the x-dimension and the y-dimension to identify patterns.
AlexNet is an example traditional CNN architecture. AlexNet provides grouped convolutions where multiple convolutional layers are stacked on top of each other. Input is sliced along the z-dimension into a number of equally sized groups. For example, the input may be sliced into two groups, and maximally tall filters on the two groups are independently learned using two independent pieces of computation hardware. The learned filters are shorter than the input and do not enforce homogeneity among the input.