The present invention relates generally to artificial neural networks, and more particularly to a method and apparatus for evolving an artificial neural network.
An artificial neural network (ANN) is used herein to refer to an analysis paradigm that is roughly modeled after the massively parallel structure of a biological neural network such as the human brain. An ANN is typically implemented with many relatively simple processing elements (PEs) that are interconnected by many weighted connections in order to obtain a computational structure that simulates the highly interconnected, parallel computational structure of biological neural networks. Hereinafter, the terms network and neural network are used interchangeably with the term artificial neural network.
The terms neural network topology and topology are used herein to refer to the number of PE layers of a neural network, number of PEs per PE layer of the neural network, and the interconnections between PEs of the neural network.
The term neural network architecture is used herein to refer to the neural network topology, activation functions implemented by the PEs of the neural network, and learning algorithms specified for a neural network.
The term evolutionary computation is used herein to refer to machine learning optimization and classification paradigms that are roughly based on evolutionary mechanisms such as biological genetics and natural selection. The evolutionary computational field includes genetic algorithms, evolutionary programming, genetic programming, and evolution strategies.
A swarm is used herein to refer to a population of interacting elements that collaboratively search through a problem space in order to optimize some global objective. Interactions between relatively local (topologically) swarm elements are often emphasized. Moreover, a swarm tends to have a general stochastic (or chaotic) characteristic that causes swarm elements to move toward a center of mass in the population located on critical dimensions, thus resulting in convergence on an optimum for the global objective of the swarm.
A particle swarm, as used herein, is similar to a genetic algorithm (GA) in that the system is initialized with a population of randomized positions in hyperspace that represent potential solutions to an optimization problem. However, each particle of a particle swarm, unlike a GA, is also assigned a randomized velocity. The particles (i.e. potential solutions) are then xe2x80x9cflownxe2x80x9d through hyperspace based upon their respective velocities in search of an optimum solution to a global objective.
Artificial neural networks and evolutionary computational techniques are effective in solving certain classes of problems. For example, artificial neural networks are good at mapping input patterns to output patterns in such applications as diagnostic systems. Moreover, evolutionary computational techniques are good at optimizing an objective in such applications as scheduling systems. In light of the fact that artificial neural networks and evolutionary computational techniques excel at different classes of problems, engineers and scientists have combined artificial neural networks and evolutionary computational techniques in order to develop hybrid computational tools that are even more effective than either methodology by itself.
For example, in Russ Eberhart, et al., Computational Intelligence PC Tools (1996) a particle swarm technique is described which evolves weights for weighted connections of a neural network. Use of the particle swarm technique has proven to be highly successful and efficient at accurately evolving neural network weights. However, the particle swarm technique described in Computational Intelligence PC Tools does not evolve the activation functions used by PEs of the neural network structure nor does the described particle swarm technique evolve aspects of the neural network topology such as the number of PE""s used to implement the neural network. Accordingly, while the described particle swarm technique may successfully train a neural network, a simpler neural network (i.e. fewer PEs and/or less complex activation functions) may be obtainable if the particle swarm technique were extended to evolve additional aspects of the neural network.
A need, therefore, exists for a method and apparatus which evolve neural network weights and other neural network parameters to obtain a simpler neural network than achievable by evolving only neural network weights.
In accordance with one embodiment of the present invention, there is provided a method of evolving a neural network that includes a plurality of processing elements interconnected by a plurality of weighted connections. One step of the method includes obtaining a definition for the neural network by evolving a plurality of weights for the plurality of weighted connections, and evolving a plurality of activation function parameters associated with the plurality of processing elements. Another step of the method includes determining whether the neural network definition may be simplified based upon at least one activation function parameter of the plurality of activation function parameters. The method also includes the step of updating the definition for the neural network in response to determining that the neural network definition may be simplified.
Pursuant to another embodiment of the present invention, there is provided another method of evolving a neural network that includes a plurality of processing elements interconnected by a plurality of weighted connections. The method includes the step of initializing a swarm of particles in which each particle includes (i) a velocity vector that represents motion of the particle through a hyperspace, and (ii) a position in the hyperspace that represents a plurality of weights for the plurality of weighted connections and a plurality of activation function parameters associated with the plurality of processing elements. The method also includes the step of determining for each particle of the swarm, a fitness value for a respective definition of the neural network that includes the respective plurality of weights defined by the particle and the plurality of activation function parameters defined by the particle. Another step of the method includes determining based upon the fitness values whether termination criteria have been satisfied. Moreover the method includes the steps of updating for the each particle of the swarm, a personal best value and a personal best position based upon the respective fitness value for the each particle, updating for the each particle of the swarm, a local best value and a local best position based upon fitness values associated with a respective group of the particles, and updating for the each particle of the swarm, the position and the velocity vector for the particle based upon the personal best position for the particle, the local best position for the particle, and the velocity vector for the particle. Finally, the method includes the step of repeating the above determining and updating steps until the termination criteria have been satisfied.
Pursuant to yet another embodiment of the present invention, there is provided a computer readable medium for evolving a neural network that includes a plurality of processing elements interconnected by a plurality of weighted connections. The computer readable medium includes code which when executed by a network evolution system causes the network evolution system to obtain a definition for the neural network by evolving a plurality of weights for the plurality of weighted connections, and evolving a plurality of activation function parameters associated with the plurality of processing elements. Moreover, the code of the computer readable medium when executed by the network evolution system further causes the network evolution system to determine whether the neural network definition may be simplified based upon at least one activation function parameter of the plurality of activation function parameters. The code of the computer readable medium when executed by the network evolution system also causes the network evolution system to update the definition for the neural network in response to determining that the neural network definition may be simplified.
Pursuant to a further embodiment of the present invention, there is provided a computer readable medium for evolving a neural network that includes a plurality of processing elements interconnected by a plurality of weighted connections. The computer readable medium includes code which when executed by a network evolution system causes the network evolution system to initialize a swarm of particles in which each particle includes (i) a velocity vector that represents motion of the particle through a hyperspace, and (ii) a position in the hyperspace that represents a plurality of weights for the plurality of weighted connections and a plurality of activation function parameters associated with the plurality of processing elements. The code of the computer readable medium when executed by the network evolution system further causes the network evolution system to determine for each particle of the swarm, a fitness value for a respective definition of the neural network that includes the respective plurality of weights defined by the particle and the plurality of activation function parameters defined by the particle, and determine based upon the fitness values whether termination criteria have been satisfied.
Furthermore, the code of the computer readable medium when executed by the, network evolution system causes the network evolution system to (i) update for the each particle of the swarm, a personal best value and a personal best position based upon the respective fitness value for the each particle, (ii) update for the each particle of the swarm, a local best value and a local best position based upon fitness values associated with a respective group of the particles, and (iii) update for the each particle of the swarm, the position and the velocity vector for the particle based upon the personal best position for the particle, the local best position for the particle, and the velocity vector for the particle. Moreover, the code of the computer readable medium when executed by the network evolution system causes the network evolution system to repeat the above determining and updating actions until the termination criteria have been satisfied.
Pursuant to yet a further embodiment of the present invention, there is provided a network evolution system for evolving a neural network that includes a plurality of processing elements interconnected by a plurality of weighted connections, the network evolution system. The network evolution system includes a network evolver and a network simplifier. The network evolver is operable to obtain a definition for the neural network by evolving a plurality of weights for the plurality of weighted connections, and evolving a plurality of activation function parameters associated with the plurality of processing elements. The network simplifier is operable to (i) determine whether the definition may be simplified based upon at least one activation function parameter of the plurality of activation function parameters, and (ii) update the definition for the neural network in response to determining that the neural network definition may be simplified.
It is an object of the present invention to provide a new and useful method and apparatus for evolving neural networks.
It is also an object of the present invention to provide an improved method and apparatus for evolving neural networks.
It is another object of the present invention to provide a method and apparatus for evolving both connection weights and processing element activation functions of neural networks.
It is yet another object of the present invention to provide a method and apparatus for simplifying a neural network topology.
It is a further object of the present invention to provide a method and apparatus for simplifying processing element activation functions of a neural network architecture.
It is a further object of the present invention to provide a method and apparatus for evolving a neural network that may directly process non-normalized (i.e. raw) data input signals.