1. Field of the Invention
This invention relates to the field of information processing, and in particular to machine learning, neural networks, and evolutionary algorithms.
2. Description of Related Art
Neural networks are commonly employed as learning systems. Neural networks can be structured in a variety of forms; for ease of understanding, a feed-forward neural network architecture is used herein as a paradigm for neural networks, although the application of the principles presented herein will be recognized by one of ordinary skill in the art to be applicable to a variety of other neural network architectures. A typical feed-forward neural network comprises one or more input nodes, one or more output nodes, and a plurality of intermediate, or hidden, nodes that are arranged in a series of layers between the input and output nodes. In a common neural net architecture, each input node is connected to one or mode hidden nodes in a first layer of nodes, each hidden node in the first layer of nodes is connected to one or more hidden nodes in a second layer of nodes, and so on until each node of the last layer of hidden nodes is connected to each output node. The output of each node is typically a nonlinear function of a weighted combination of each input to the node. In a feedforward neural net, when a set of input values is applied to the input nodes, the weighted values are propagated through each layer of the network until a resultant set of output values is produced. Other configurations of nodes, interconnections, and effect propagation are also common. For example, in some architectures, a node may be connected to one or more other nodes beyond its immediately adjacent layer.
In a learning mode, the resultant set of output values is compared to the set of output values that a properly trained network should have produced, to provide an error factor associated with each output node. In the case of pattern matching, for example, each output node may represent a likelihood that the input pattern corresponds to a particular class. Each input pattern is pre-categorized to provide an xe2x80x9cidealxe2x80x9d set of likelihood factors, and the error factor is a measure of the difference between this xe2x80x9cidealxe2x80x9d set and the set of output node values that the neural network produced. The error factor is propagated back through the network to modify the weights of each input to each node so as to minimize a composite of the error factors. The composite is typically the sum of the square of the error factor at each output node. Conceptually, the node weights that contributed to the outputs of the incorrect class are reduced, while those that contributed to the output of the correct class are increased.
Although the error factor can be propagated back based on each comparison of the ideal output and the result of processing each input set, preferably, a plurality, or batch, of input sets of values is applied to the network, and an accumulated error factor is back-propagated to readjust the weights. Depending upon the training technique employed, this process may be repeated for additional sets or batches of input values. The entire process is repeated for a fixed number of iterations or until subsequent iterations demonstrate a convergence to the xe2x80x9cidealxe2x80x9d, or until some other termination criterion is achieved. Once the set of weights is determined, the resultant network can be used to process other items, items that were not part of the training set, by providing the corresponding set of input values from each of the other items, to produce a resultant output corresponding to each of the other items.
The performance of the neural network for a given problem set depends upon a variety of factors, including the number of network layers, the number of hidden nodes in each layer, and so on. Given a particular set of network factors, or network architecture, different problem sets will perform differently. U.S. Pat. No. 5,140,530 xe2x80x9cGENETIC ALGORITHM SYNTHESIS OF NEURAL NETWORKSxe2x80x9d, issued Aug. 18, 1992 to Guha et al, and incorporated by reference herein, presents the use of a genetic algorithm to construct an optimized custom neural network architecture. U.S. Pat. No. 5,249,259 xe2x80x9cGENETIC ALGORITHM TECHNIQUE FOR DESIGNING NEURAL NETWORKSxe2x80x9d, issued Sep. 28, 1993 to Robert L. Harvey, and incorporated by reference herein, presents the use of a genetic algorithm to select an optimum set of weights associated with a neural network.
Genetic algorithms are a specific class of evolutionary algorithms and the term evolutionary algorithm is used hereinafter. Evolutionary algorithms are commonly used to provide a directed trial and error search for an optimum solution wherein the samples selected for each trial are based on the performance of samples in prior trials. In a typical evolutionary algorithm, certain attributes, or genes, are assumed to be related to an ability to perform a given task, different combinations of genes resulting indifferent levels of effectiveness for performing that task. The evolutionary algorithm is particularly effective for problems wherein the relation between the combination of attributes and the effectiveness for performing the task does not have a closed form solution.
In an evolutionary algorithm, the offspring production process is used to determine a particular combination of genes that is most effective for performing a given task. A combination of genes, or attributes, is termed a chromosome. In the genetic algorithm class of evolutionary algorithms, a reproduction-recombination cycle is used to propagate generations of offspring. Members of a population having different chromosomes mate and generate offspring. These offspring have attributes passed down from the parent members, typically as some random combination of genes from each parent. In a classic genetic algorithm, the individuals that are more effective than others in performing the given task are provided a greater opportunity to mate and generate offspring. That is, the individuals having preferred chromosomes are given a higher opportunity to generate offspring, in the hope that the offspring will inherit whichever genes allowed the parents to perform the given task effectively. The next generation of parents is selected based on a preference for those exhibiting effectiveness for performing the given task. In this manner, the number of offspring having attributes that are effective for performing the given task will tend to increase with each generation. Paradigms of other methods for generating offspring, such as asexual reproduction, mutation, and the like, are also used to produce offspring having an increasing likelihood of improved abilities to perform the given task.
As applied to neural networks, the chromosome of the referenced ""530 (Guha) patent represents the architecture of a neural network. Alternative neural networks, those having different architectures, each have a corresponding different chromosome. After a plurality of neural networks have been trained, each of the networks is provided evaluation input sets, and the performance of each trained neural network on the evaluation input sets is determined, based on a comparison with an xe2x80x9cidealxe2x80x9d performance corresponding to each evaluation input set. The chromosomes of the better performing trained neural networks are saved and used to generate the next set of sample neural networks to be trained and evaluated. By determining each next generation of samples based on the prior successful samples, the characteristics that contribute to successful performance are likely to be passed down from generation to generation, such that each generation tends to contain successively better performers.
The speed at which a particular neural network converges to an optimal set of weights is highly dependent upon the initial value of the weights in the neural network. Similarly, the likelihood of a particular neural network converging on a xe2x80x9cglobalxe2x80x9d optimum, rather than a xe2x80x9clocalxe2x80x9d optimum, is highly dependent upon the initial value of the weights in the neural network. In like manner, the success of a particular neural network may be dependent upon the number of training cycles applied, whereas the cost of applying an unbounded set of training cycles may exceed the benefits derived. Globally, the likelihood of evolving to an optimal architecture may be highly dependent upon the selection of initial chromosomes used in the original ancestral chromosomes. Because of these dependencies on initial conditions, conventional evolutionary algorithms employ random values to initialize most states and conditions of each network, to avoid the introduction of biases that could affect the accuracy of the results. As such, the determination of an optimal neural network architecture via an evolutionary algorithm is an inherently xe2x80x9cnoisyxe2x80x9d process. Potentially better performing architectures may score poorly because of the particular evaluation test set used, or because of inadequate training compared to a less robust architecture that is easily trained, and so on. In like manner, the use of randomly selected training sets or evaluation sets among the evaluated neural networks may cause potentially worthwhile architectures to be rejected prematurely, obviating the advantages realizable by a directed trial and error process.
It is an object of this invention to provide a method for improving neural network architectures via an evolutionary algorithm that reduces the adverse effects of the noise that is introduced by the network initialization process. It is a further object of this invention to reduce the noise that is introduced by the network initialization process. It is a further object of this invention to provide an optimized network initialization process. It is a further object of this invention to reduce the noise that is introduced by the use of randomly selected training or evaluation input sets.
These objects and others are achieved by including parameters that affect the initialization of a neural network architecture within the encoding that is used by an evolutionary algorithm to optimize the neural network architecture. The example initialization parameters include an encoding that determines the initial nodal weights used in each architecture at the commencement of the training cycle. By including the initialization parameters within the encoding used by the evolutionary algorithm, the initialization parameters that have a positive effect on the performance of the resultant evolved network architecture are propagated and potentially improved from generation to generation. Conversely, initialization parameters that, for example, cause the resultant evolved network to be poorly trained, will not be propagated. In accordance with a second aspect of this invention, the encoding also includes parameters that affect the training process, such as the duration of the training cycle, the training inputs applied, and so on. In accordance with a third aspect of this invention, the noise effects caused by the random selection of training or evaluation sets is reduced by applying the same randomly selected training or evaluation set to all architectures that are directly compared with each other.