An artificial neural network, or simply “neural network,” is a computer model, resembling a biological network of neurons. Neural networks are a family of methods within machine learning, under artificial intelligence. A traditional NN has an input later, multiple middle or hidden layer(s), and an output layer. Each layer has a plurality (e.g., 100s to 1000s) of artificial “neurons.” Each neuron in a layer (N) may be connected by an artificial “synapse” to all neurons in a prior (N−1) layer and subsequent (N+1) layer to form a “fully-connected” NN.
As the number of neurons increases, the number of synapses connecting those neurons grows exponentially. A fully-connected NN for recognizing a standard image can have many billions of synapses. Solving this many connections is impractical and time consuming for many problems. Additionally, fully-connected NN do not use any location related information in the input, and by connecting all the neurons from one layer to another, retain only the numerical values, but not the relative positional information (e.g., as if all the pixels in the image are randomly shuffled in their places). This leads to the omission of potentially valuable information that can guide the training of the NN much more efficiently.
Convolutional NN (CNN) take advantage of local correlations by connecting an entire region of neurons (e.g., representing a 3×3 pixel image region) of a layer to a single neuron (e.g., a transformation or convolution of the region) in a convolution layer. Connecting entire regions of multiple neurons to each single convolution neuron forms synapses having a many-to-one neuron connection, which reduces the number of synapses in CNNs as compared to the one-to-one neuron connected synapses in a fully-connected NN. With fewer synapses, CNNs can be trained in significantly less time than fully-connected NNs. Additionally, CNNs preserve the local positional information of the input values, and implicitly use the relative location of inputs and patterns to guide their training.
A NN is trained based on a leaning dataset to solve or learn a weight of each synapse indicating the strength of that connection. The weights of the synapses are generally initialized, e.g., randomly. Training is performed by iteratively inputting a sample dataset into the NN, outputting a result of the NN applied to the dataset, calculating errors between the expected (e.g., target) and actual outputs, and adjusting NN weights to minimize errors. Training may be repeated until the error is minimized or converges. Typically multiple passes (e.g., tens or hundreds) through the training set is performed (e.g., each sample is input into the NN multiple times). Each complete pass over the entire training set is referred to as one “epoch”.
Genetic algorithms (GA) have been used to train NNs. GAs represent the set of weights of a NN as an artificial “chromosome,” e.g., where each chromosome represents one NN. The NN may be initialized by a random population of such chromosomes. Genetic algorithms then evolve the population of chromosomes by performing the steps of (a) measuring the fitness or accuracy of each chromosome (e.g., the lower the average loss over the training set, the better the fitness), (b) selecting the fitter chromosomes for breeding, (c) performing recombination or crossover between pairs of parent chromosomes (e.g., randomly choose weights from the parents to create the offspring), and (d) mutating the offspring. These methods of (c) recombination and (d) mutation have led to too much variability and disruption in offspring populations and volatility in training, resulting in reduced accuracy and convergence as the size of the NN grows. The result is that genetic algorithms do not converge in practical or finite evolutionary time and, given the same training time have inferior accuracy as compared to traditional training methods (e.g., relying on backpropagation alone) for all but the smallest NNs. As current technology trends towards using “deep” neural networks with immense sizes, genetic algorithms have largely been abandoned for training NNs, and have been replaced by traditional backpropagation algorithms.
Accordingly, there is a need in the art to increase the speed and accuracy of genetic algorithms to converge in training NN and the accuracy of the resulting NNs.