The present invention relates to artificial neural networks, and more particularly artificial neural networks having an architecture optimized by the use of a genetic algorithm.
The term artificial neural network is used herein to describe a highly connected network of artificial neurons. For simplicity, the modifier xe2x80x9cartificialxe2x80x9d will usually be omitted.
Artificial neurons themselves have simply described behavior. They are threshold circuits that receive input signals and which develop one or more output signals. The input-output relationship of a neuron is determined by a neuron activation or threshold function and the sum of the input signals. The activation function may be a simple step function, a sigmoid function or some other monotonically increasing function.
Neurons are combined in highly connected networks by signal transmission paths to form neural networks. The signal transmission paths have weights associated with them, so that a signal applied to a neuron has a signal strength equal to the product of the signal applied to the signal path and the weight of that signal path. Consequently the signals received by the neurons are weighted sums determined by the weight values of the signal transmission paths and the applied signal values.
The interconnectivity of the neurons in a neural network gives rise to behavior substantially more complex than that of individual neurons. This complex behavior is determined by which neurons have signal transmission paths connecting them, and the respective values of the signal transmission path weights. Desired network behavior can be obtained by the appropriate selection of network topology and weight values. The process of selecting weight values to obtain a particular network characteristic is called training. Different neural network architectures and techniques for training them are described in Parallel Distributed Processing, Vol. 1, D. E. Rumelhart, J. L. McClelland and P. R. Group, Editors, MIT Press, 1986.
Properly trained neural networks exhibit interesting and useful properties, such as pattern recognition functions. A neural network having the correct architecture and properly trained will possess the ability to generalize. For example, if an input signal is corrupted by noise, the application of the noisy input signal to a neural network trained to recognize the input signal will cause it to generate the appropriate output signal. Similarly, if the set of training signals has shared properties, the application of an input signal not belonging to the training set, but having the shared properties, will cause the network to generate the appropriate output signal. This ability to generalize has been a factor in the interest and tremendous activity in neural network research that is now going on.
Trained neural networks having an inappropriate architecture for a particular problem do not always correctly generalize after being trained. They can exhibit an xe2x80x9cover trainingxe2x80x9d condition in which the input signals used for training will cause the network to generate the appropriate output signals, but an input signal not used for training, and having a shared property with the training set, will not cause the appropriate output signal to be generated. The emergent property of generalization is lost by over training.
It is an object of the invention to optimize the architecture of a neural network so that over training will not occur, and yet have a network architecture such that the trained network will exhibit the desired emergent property.
According to the invention a neural network is defined, and its architecture is represented by a symbol string. A set of input-output pairs for the network is provided, and the input-output pairs are divided into a training set and an evaluation set. The initially defined network is trained with the training set, and then evaluated with the evaluation set. The best performing networks are selected.
The symbol strings representing the selected network architectures are modified according to a genetic algorithm to generate new symbol strings representing new neural network architectures. These new neural network architectures are then trained by the training set, evaluated by the evaluation set, and the best performing networks are again selected. Symbol strings representative of improved networks are again modified according to the genetic algorithm and the process is continued until a sufficiently optimized network architecture is realized.