In machine learning, a convolutional neural network (CNN) is a class of deep, feedforward artificial neural networks, which may be applied to analyzing visual imagery. The neuron connectivity pattern in a CNN is based on the organization of the visual cortex. Neurons may respond to input in a restricted region of space known as the receptive field. Receptive fields may partially overlap, over-covering the entire visual field. Neuron response may be approximated mathematically by a convolution operation. A CNN may include an input and an output layer, as well as multiple hidden layers. The hidden layers of a CNN may include convolutional layers, pooling layers, fully connected layers and normalization layers. Convolutional layers may apply a convolution operation to the input, passing the result to the next layer. The convolution models the response of an individual neuron to visual input. Each convolutional neuron may process data only for its receptive field.
The term “convolution” in neural networks may refer to a cross-correlation operation in which a filter, which may be represented as a matrix, is applied to input data, e.g., an image, which may also be represented as a matrix, using a sliding-window dot-product. For example, a filter matrix, analogous to a receptive field in a larger visual field, may be moved across an input matrix in a sequence of steps. A dot-product operation between the input matrix and the filter matrix may be performed at each step to generate a numeric value. The numeric value may then be stored in an output matrix. The filter matrix may be moved by a distance of one position, e.g., to the next adjacent element in a particular direction at each step, or by a larger distance, e.g., two or more positions. The distance by which the filter matrix is moved is referred to as the stride of the convolution. The network may learn the filters, unlike traditional algorithms, in which filter values may be determined by hand.
Although fully-connected feedforward neural networks can be used to learn features as well as classify data, the number of neurons used for image processing in a feedforward network is large because each pixel in an image may be a variable, and the number of weights for each neuron in a fully-connected network may be based on the size of the image, which may result in thousands of parameters. The convolution operation reduces the number of parameters. For example, regardless of image size, using the sliding-window filter to tile regions of size 6×6, each with the same shared weights, may involve 36 learnable parameters. CNNs may thus be deeper and have fewer parameters than fully-connected networks.