Biological neural networks are known to have many desirable characteristics. For example, they are able to perform complex, nonlinear tasks using large numbers of relatively simple building blocks. Biological neural networks are robust, able to extrapolate information from a specific setting to apply to a more general setting, and adaptable to change. For these reasons and many others, it has been a goal of the machine learning community to produce networks with similar capabilities to biological central nervous systems, brains and, in particular to the human brain.
In order to appreciate the neuroscience-inspired artificial neural network of the present invention, a brief introduction to the neural components, by example, of the human brain and the larger components of the human brain itself is provided. Biological neurons are the nerve cells present in the brain. The human brain consists of about 1011 neurons, each of which operates in parallel with the others. A typical biological neuron is shown in FIG. 1. A process in neuroscience usually refers to a physical feature. The various processes of the neuron are called neurites; henceforth, the term neurite will be used rather than process to avoid confusion with the computer science notion of process. The neuron receives information through neurites called dendrites 110, which also communicate the information to the neuron's cell body 120. The cell body 120 has a nucleus 130. The neurite that transmits information out of the neuron to other targets is called the axon 140 having axon terminals 190. A myelin sheath 160 comprises a Schwann cell 170. Signals between neurons are usually transferred across synapses, although direct connections that allow ion exchange have been observed. Typically, the communication is done chemically via neurotransmitters.
Dendrites 110 are usually shorter than axons 140 and arise from the cell body 120 of the neuron. They generally branch off into dendritic spines, which receive information from axons from other neurons. The dendritic spines are typically where the communication between neurons across synapses and from axons takes place, although sometimes communication is direct from cell body to cell body, or between dendrites.
Although information is transmitted from an axon 140 to a dendrite 110 in a typical synapse, there are also synapses between two axons, two dendrites, and synapses and from axons in which information travels from dendrite 110 to axon 140. Because of these differences, connections between neurons in the artificial neural networks defined herein will all be referred to only as synapses, with no distinction between dendrites and axons. The synapses as known in biological systems are uni-directional in that information travels from one neuron to another via a synapse connection, but not in the opposite direction along that synapse.
There are two ways for synaptic transmission to take place in the brain: electrical transmission and chemical transmission. Electrical transmission occurs when the current generated by one neuron spreads to another neuron on a pathway of low electrical resistance. Electrical synapses are relatively rare in the mammalian brain; evidence suggests that they occur in regions where the activities of neighboring neurons need to be highly synchronized. In chemical transmissions, neurotransmitters are transmitted from one neuron to another.
A neurotransmitter is a chemical substance that is typically synthesized in a neuron and is released at a synapse following depolarization of at least a portion of the neuron's cell membrane (typically near the synapse). The neurotransmitter then binds to receptors at a postsynaptic cell and/or postsynaptic terminal to elicit a response. This response may excite or inhibit the neuron, meaning neurotransmitters play a major role in the way the brain operates. Some of the known neurotransmitters are acetylcholine, glutamate, GABA, glycine, dopamine, norepinephrine, serotonin and histamine.
Neurotransmitters are released according to action potentials in the neuron. An action potential is a fluctuation in the membrane potential of the neuron, which is the voltage difference across the cell membrane caused by differences in ion concentrations between the outside and inside of the neuron. Neurons have a particular membrane potential in which they are at rest. Typically, a neuron is “at rest” when the potential inside the neuron's cell wall is approximately −70 mV compared to the outside of the neuron. When positively charged ions flow out of the cell, the membrane potential becomes more negative, while positive ionic current flowing into the cell changes the membrane potential to a less negative or positive value. Negative ions have an opposite effect. Each neuron has an associated threshold level. If the membrane potential rises above this threshold level, the neuron generates an action potential. The generation of the action potential is called a “firing” of the neuron.
The generation of an action potential relies not only on the threshold of the neuron but also on the recent firing history. Each neuron has an associated refractory period. For a short period of time after a neuron has fired, it is highly unlikely that that neuron will fire again. This period is called the absolute refractory period. For a slightly longer period of time after the absolute refractory period, it is difficult, but more likely, for the neuron to fire again. This period is called the relative refractory period.
In the central nervous system, multiple types of cells provide myelin sheaths 160 along axons 140. Myelin is a fat that provides an insulating layer for the axon 140. The thickness of the myelin sheath 160 controls the propagation delay of signals along the axon 140. Myelin sheaths 160 are separated along the axon by nodes of Ranvier 150. The action potential traveling along the axon is regenerated at each of the nodes of Ranvier. Having described a typical neuron, the parts of the human brain will now be discussed with reference to FIG. 2.
The basal ganglia (corpus striatum) 210 is one of the most important layers of the brain 200 for emotion processing and generation; it is also known as the reptilian brain. The basal ganglia connects the cerebral cortex and the cerebellum. The basal ganglia 210 is the portion of the brain that contains innate behavioral knowledge, including motor functions and primal emotions such as fear, anger, and sexuality. It is also responsible for motor integration in the cerebral cortex, i.e. it helps regulate movement. The next layer of the brain known as the limbic system or the visceral brain, is where many of the various social emotions are processed. It processes most affective knowledge, generating more sophisticated emotional responses. The limbic system also appears to mediate or control memory processes. Both the amygdala 220 and the hippocampus 230 are part of the limbic system. The hippocampus 230 plays an important role in memory formation in the brain, particularly short term memory (memory of new information and recent events). The amygdala 220 is important for learning associations between stimuli and emotional value (emotional responses and aggressive behavior). For example, the amygdala may associate fear with a stimulus that causes pain.
The neocortex 240 is the structure in the brain that is more evolved in human brains than in other mammal brains. The neocortex 240 is responsible for associating a diversity of sensations and innate ideas, such as a sense of causality and spatial referencing, into perception, concepts and attributions. This is the portion of the brain that contains what we think of as the rational mind and the imagination and the part of the brain that generates ideas (higher mental functions, general movement, perception and behavioral responses). The neocortex 240 in humans is organized in six layers, which are parallel to the surface of the cortex. The neurons in the neocortex are organized in cylindrical columns (cortical columns), which are perpendicular to the cortical surface. Axons 140 that traverse vertically in the neocortex 240 typically form connections to neurons within a column, but among the neurons in different layers. Axons 140 that traverse horizontally in the neocortex 240 allow communication between neurons in different columns.
There are two types of memory in the brain: declarative memory and non-declarative memory. Declarative memory is explicit memory and typically depends on the hippocampus 230 and other areas of the brain. Declarative memory includes episodic memory (memory of events from one's life) and semantic memory (general knowledge of the world). The hippocampus 230 retains context-dependent memories until they are consolidated in neocortical structures, but there is evidence that these memories are stored differently in the two structures. Non-declarative memory, on the other hand, is implicit, procedural memory and depends mostly on the basal ganglia and parts of the cerebral cortex (including the neocortex 240). Non-declarative memory is needed to learn skills, such as swimming. For the most part, however, it is still unclear precisely how learning and memory work in the human brain. It is clear that in order for the brain to learn, the structure of the brain must be somewhat plastic; that is, the structure must be able to adapt. Synaptic plasticity dependent on the activity of the synapses is widely thought to be the mechanism through which learning and memory take place. The Hebb rule comprises the idea that if the action potential from one neuron causes another neuron to fire, then the synapse along which the action potential travels should be strengthened (or when a synapse is not used, a decrease in strength). These decreases take place when a particular synapse repeatedly fails to be involved in the firing of a neuron and are supported by experiment.
The effects of these increases and decreases of strength in the synapses can be both short-term and long-term. If the effects last a significant period of time, they are called long-term potentiation (LTP) and long-term depression (LTD). Synaptic plasticity is seen as a process that occurs gradually over time, and the rate of the change can be specified by one or more time constant(s).
Now, the development of artificial neural networks will be discussed, for example, in the context of efforts to simulate the wonders of the human brain. Artificial neural networks can be thought of as directed weighted graphs, where the neurons are the nodes and the synapses are the directed edges. Known neural network architectures are typically made up of input neurons, output neurons and so-called “hidden” neurons. The hidden neurons are those that are neither input neurons nor output neurons in such a network. The structural types include feed-forward neural networks, recurrent neural networks and modular neural networks.
Referring to prior art FIG. 3, there is shown a fully-connected feed-forward neural network comprising input neurons 310-1, 310-2, 310-3, . . . , 310-N to the left and output neurons 330-1, 330-2, 330-3, . . . , 330-P to the right with hidden neurons 320-1, 320-2, 320-3, . . . , 320-M between input and output neurons. It is not shown but one hidden neuron may connect to another hidden neuron. In feed forward neural networks, there is a layer of input neurons, zero or more layers of hidden neurons, and an output layer. Input layers only contain outgoing edges, and the edges of one layer are only connected to the next layer (whether it be a hidden layer or the output layer). Networks may either be fully connected as seen in FIG. 3, in the sense that every neuron in a layer has a directed edge to every neuron in the next layer, or they may only be partially connected, where some of these edges are missing.
Referring now to prior art FIG. 4, there is shown an example of a known recurrent neural network. Recurrent neural networks contain at least one loop, cycle, or feedback path. FIG. 4 shows the input neurons 410-1 to 410-N, output neurons 430 and hidden neurons 420-1, 420-2, . . . , 420-M with the same shading as in FIG. 3. Delay elements 440 are indicated with boxes labeled D. A loop in a directed graph is when there is an edge from a node to itself. Cycles in a directed graph occur when there is a path from a node to itself that contains other nodes. Feedback loops and paths typically involve delay elements D 440. Feedback allows for storage to take place in the neurons; it gives the network a sense of memory from one instance to the next. Recurrent neural networks can be divided further into discrete-time and continuous-time neural networks. Charge is applied periodically or after randomly spaced intervals at inputs at moments in time and propagates through the network, producing no output no earlier than when the charge is applied. Continuous-time neural networks model behaviors such as spikes in the network at infinitesimally small time steps. These spikes are typically modeled using a differential equation rather than as discrete events and may not have a stable solution, especially for networks that contain loops.
A neural network is modular if the computation performed by the network can be decomposed into two or more subsystems that operate on distinct inputs without communication. The outputs of these modules are then combined to form the outputs of the network. A known modular neural network may be one of a recurrent neural network or a feed-forward neural network or other artificial neural network.
Neurons in neural networks are the information processing units of the network. Neurons usually accumulate, combine, or sum signals they receive from their connections, and an activation function is applied to the result. A neuron in the network is said to fire if the output value is non-zero. Several different activation functions are commonly used. There may be a threshold function when the charge reaches a threshold value, a piecewise-linear function sometimes called saturation of a neuron and a sigmoid function related to the slope of increase of charge.
Training in a neural network has canonically meant changing the weights of the connections and/or the threshold values. Relatively recently, training has also referred to changes in the architecture of the network. Neural networks with training algorithms that cannot change the architecture of networks may be considered fixed-structure. Similarly, networks with training algorithms that can change the architecture may be considered variable-structure.
There are two main methods of training: gradient-based methods and evolutionary methods. Back-propagation is the most widely used algorithm for training neural networks in a supervised way. The algorithm is supervised because it requires a set of inputs and their corresponding outputs, called a training set. Back-propagation has two distinct phases: a forward pass and a backward pass. In the forward pass, input signals are propagated through the network, to produce an output. This output is compared with the expected output, producing an error. The error signals are then propagated backwards through the network, where the weights of the networks are adjusted in order to minimize the mean-squared error. Back propagation is a gradient-based optimization technique. It makes use of the gradient of an error function, evaluated using a training data set, with respect to the weights in the network. That is, back propagation uses the gradient of an error to determine how the weights in the network should be changed to reduce the error.
One of the known limitations of back propagation and other supervised learning algorithms is that they typically do not scale well. Gradient-based optimization algorithms have several known limitations as well. Because the weights are changed so that the error follows the steepest direction (in the space of weights) of descent, the results of the optimization algorithm depend largely on the initial starting point. If the initial starting point is located near local optima and far away from the global optimum, the back-propagation algorithm will likely converge to one of the local optima. This is a drawback for the back propagation algorithm because complex systems often have many local optima with significantly different (poorer) performance than a global optimum.
Another known type of training is Hebbian learning. Hebbian learning is analogous to long-term potentiation (LTP) and long-term depression (LTD) that occurs in the brain. In LTP, if the firing of one neuron occurs before the firing of a receiving neuron, then the synapse between these two is strengthened. That is, in LTP, the possibility of a causal relationship between the two neurons (i.e. that the firing of one directly leads to the firing of another), influences how synaptic changes are made. In LTD, the strength of the synapse is decreased when the firing of one neuron does not lead to the firing of its connected neurons, or when the firing of one neuron occurs while the receiving neuron is in a refractory state or has recently fired. In LTD, the possibility of a non-causal relationship between the two neurons influences how synaptic changes are made. For example, if a receiving neuron fired immediately prior to the firing of a transmitting neuron, it may be appropriate to decrease the strength of the synapse.
There are four characteristics of Hebbian synapses. Modifications to a Hebbian synapse depend heavily on time in that increases are made if neurons are activated at the same time, and decreases are made if two neurons are activated at different times. All information required to determine if a change to a Hebbian synapse should be made is local information. That is, the only information required to know if a synapse should change is the activities of the neurons that are connected by that synapse. Changes in the weight of a Hebbian synapse are determined by the firing patterns of the two neurons connected by the weight. Lastly, an increase in the strength of the synapse is caused by the conjunction of presynaptic and postsynaptic activity. Hebbian learning has been observed in biological neural networks. However, applying learning in biological systems to development of learning methods in artificial neural networks is significantly more complicated than these four characteristics imply.
So-called evolutionary algorithms are presently surpassing known, more conventional artificial network architectures. The evolution of the structure of the brain and evolution within the brain can be categorized in four forms. First, at the highest level, there is evolution via speciation, and the brain structure in particular, which has occurred over millions of years. This long-term evolution has affected every aspect of the brain, but most notably, it is the level of evolution where the gross structure of the brain has developed. Following typical evolutionary theory, the complex structures from the human brain evolved from simpler structures that underwent three evolutionary mechanisms: mutation, the introduction of new structures or pieces of structures; recombination, the combination or re-use of existing structures in novel ways; and natural selection, the dying off of unsuccessful structures.
The general structure of the brain does not differ greatly from person to person; there are certain parts of the brain that are present in nearly every individual, though as the evolution of species has occurred these structures have become more complex. These are the types of structures that are, of concern at the level of long-term evolution.
A shorter term evolution of the brain, what will be referred to in this work as moderate-term evolution, has been recently discovered. This evolution, referred to as epigenesis, also affects the structure of the brain, but at a finer level. Epigenesis is caused by modifications to the structure of proteins that regulate the transcription of genes; these modifications are often caused by the environment, but unlike other environmental effects, these modifications can be inherited by future generations through methylation of DNA. The modifications can lead to changes in the structure of the brain and thus far, have been seen to primarily affect the social and affective aspects of the brain.
The evolution (or perhaps more aptly, development and adaptation) that occurs within a single human's brain over the course of a lifetime, from conception through adulthood, will be referred to in this work as short-term evolution. The morphology of the brain is shaped partly through genetics, influenced by both long-term and moderate-term evolution, but also through experience (or by environmental effects). Neurons proliferate and die over the course of an individual's development. One of the factors that affects the formation and survival of neurons in this stage is the way connections are formed, that is, the types of neurons that a particular neuron's axon connects during development. The connections of a neuron affect the way that neuron behaves and operates in the future, and these connections are initially determined during this short-term evolutionary stage. An example of this type of evolution is found in London taxi drivers who have been found to develop significant brain areas for storing road maps of London.
There is a certain amount of plasticity during development that allows an individual to adapt the different parts of the brain (determined by long-term evolution) to his or her particular role. There are certain portions of the brain, such as the neocortex, in which the local structure (i.e. connection strengths) appears to mostly depend on the environment, rather than genetics.
Another major structural aspect of the brain that is evolved or developed over the course of single person's lifetime is myelination. Myelination affects the efficiency and rapidity of transmissions of signals in the brain. Myelination in humans continues well into the second decade of life.
Finally, very short-term evolution (development or learning, in this case) occurs on a day-to-day basis in the brain. This evolution affects synapses; this type of evolution is what is typically referred to as plasticity in the brain. There are four known major types of synaptic plasticity: long-term potentiation, long-term depression, sensitization, and axonal sprouting and formation of new synapses. Long-term potentiation and long-term depression were discussed above within the context of Hebb's rule. Long-term potentiation (LTP) is a permanent or semi-permanent change in the way a neuron fires and is caused by repeated activation with stimulation; it is associated with memory in the brain. Long-term depression (LID) refers to any form of depression in synaptic transmission, such as the lowering of signal transmission efficacy. Long-term potentiation (LTP) occurs only when a synapse is active, but long term depression can occur whether a synapse is active or inactive.
Sensitization refers to enhancement of a response as a result of applying a novel stimulus. Finally, axons can sprout, both during initial formation and after transection, in the brain. Axon sprouting occurs most commonly during neonatal development, but it also can occur in adulthood.
Evolutionary algorithms are optimization algorithms that are often used in large, complex state spaces. Biological evolution is a method for searching a huge number of possibilities for solutions, where solutions are the organisms themselves. The biological inspiration of evolutionary algorithms is described in Flake's “The Computational Beauty of Nature” as follows:Adaptation=Variation+Selection+Heredity.
In evolutionary algorithms, a population of potential solutions is maintained. The members of the population are usually distinct and maintain variety. Evolutionary algorithms are inherently random, and the random influences contribute to the variety in the population. Selection is perhaps the most important component of the formula given above. Selection refers to the concept of “survival of the fittest.” For evolutionary algorithms, some concept of fitness must exist, where fitness is typically a function or algorithm mapping members of the population to numerical values. It is worth noting that the fitness function can be based on simulated values, so it may generate different value each time it is applied to a member of the population. The fitness of a member of a population should represent the relative ability of that member of the population to perform a particular task. The fittest members of the population are those that are most likely selected to reproduce and express traits that are kept over multiple generations. Members of the population that are the least fit are those that are more likely to be allowed to die off. Heredity is emulated in evolutionary algorithms by producing “offspring” from existing members of a population. The offspring can be produced in a variety of algorithm-specific ways. The sequence of typical operations for producing offspring are reproduction, crossover and mutation.
For reproduction, one or more relatively fit members of the population may be selected to reproduce. Members of the population that have a higher fitness level may be more likely to have offspring in the next generation of the population. The selection of these members of the population can be done in a variety of ways. One of the ways this is done is using Roulette selection. In Roulette selection, a member of the population is randomly selected, where the probability that a given member of the population is selected is based on that population member's fitness. That is, if a member has a high fitness, it is more likely to be selected. Another selection algorithm is tournament selection. In tournament selection, a fixed percentage of the population is randomly selected. From that smaller group, the member with the highest fitness is selected. The percentage selected from the original population is a parameter of this method. For example, if you select 100 percent of the population to be this parameter, then the fittest member of the population would always be selected. However, if you had a population size of 100 and selected one percent of the population, then the selection would be entirely random (i.e. not based on fitness at all).
In crossover, attributes of two or more members of the population are combined to form a new member of the population. Finally, mutation can occur, in which some attribute of the new member is randomly changed in some way. Different types of mutations can be employed, depending upon the complexity of the representation of each member of the population. Both crossover and mutation have associated rates in an evolutionary algorithm. The crossover rate is the percentage of time in which selected members of the parent population are crossed over or combined to produce members of the child population, whereas the mutation rate is the rate at which members of the parent population are mutated to produce members of the child population. Assuming neither of these rates is 1, there may be some propagation of identical members of the parent population to the child population.
Neuroevolution algorithms use evolutionary algorithms to train neural networks. The first neuroevolution algorithms that were developed only evolved the strength of the connections between the neurons; they did not affect the structure by adding or deleting connections or neurons. They only dealt with one form of evolution described above: very short term evolution.
The training of the connection weights in neural networks is typically formulated as an optimization problem. In particular, some error is minimized, or equivalently, a measure of performance or a goal is maximized. These approaches are equivalent because if f(x) is an error function, then 1/f(x) and −f(x) are suitable candidates for goal functions, and vice versa. The error used can be the mean squared error between the actual output and the expected output in supervised learning or the temporal difference error as used in reinforcement learning. Another example goal function is the length of time of successful operation. The weights of the networks are then trained using algorithms such as back propagation or conjugate gradient. These algorithms rely on gradient-based optimization algorithms using steepest or gradient related descent directions. There are many drawbacks to using these gradient-based optimization algorithms. In particular, gradient-based algorithms rely on the differentiability of error or goal functions, and they are likely to converge to local optima.
Evolutionary algorithms had been applied in the field of optimization to similarly complex problems, as they are less likely to become trapped in non-optimal solutions. It was a natural extension to apply evolutionary algorithms to weight training in neural networks, as this problem can be formulated as an optimization problem through which an error is minimized. Xin Yao reviews (to date) works using evolutionary algorithms (EA) to evolve/train artificial neural networks (ANNs), including using EAs to find weights, structure, learning rules, and input features in his “Evolving Artificial Neural Networks,” Proceedings of the IEEE, Vol. 97, No. 9, pp. 1423-1447, September 1999. Yao cites results that indicate the combination of an EA and an ANN result in better systems than EAs or ANNs in isolation. Yao presents a thorough overview of algorithms that use evolutionary algorithms to train the weights of neural networks in “Evolving Artificial Neural Network Ensembles,” IEEE Computational Intelligence Magazine, pp. 31-42, 2008. Yao notes four advantages of evolutionary algorithms to gradient-based algorithms. First, evolutionary algorithms do not depend on gradient information, which may be unavailable or difficult to calculate. Evolutionary algorithms can be applied to any neural network architecture, whereas gradient-based algorithms have to be adapted for different architectures. Evolutionary algorithms are much less sensitive to initial conditions. Evolutionary algorithms always search for global optima, rather than local optima. It is also important to note that evolutionary algorithms typically rely on a fitness function, rather than an error. This fitness function can often be easily translated to reinforcement learning problems, where the fitness function is the reward received. As noted previously, however, goal, or fitness, functions can be used to determine error functions, and vice versa. The most straightforward way to do this is to reverse the sign.
Many known evolutionary algorithms deal with only one form of evolution: very short term evolution. For this type of evolution, the structure of the network is fixed. The structure of the network includes the general architecture (i.e. feed-forward, recurrent, etc.), the number and layout of neurons (i.e. how many neurons should be included in a particular layer), and the number and nature of the connections (i.e. how the neurons should be connected). For these types of algorithms the structure of the neural network is mostly determined via experimentation. That is, a certain structure is tested, and if that structure does not work, more neurons or connections are added manually, increasing the complexity, until the network is able to handle the problem. This requires significant hand-tuning by the experimenter/researcher. Knowledge about the problem can be applied and intuition developed to decide what sort of structure is required by certain problems. For each problem, a new structure needs to be determined and the selection of this structure relies entirely upon the knowledge of the structure designer. Networks with and without bias parameters and networks with different numbers of hidden neurons perform very differently. Because the structure has such a large effect on the efficacy of the network, an algorithm that learns what structure is needed to solve a particular problem is much more attractive than an algorithm that relies on prior knowledge or hand-tuning to design a structure. Constructive and destructive algorithms are algorithms that attempt to deal with this drawback. Both constructive and destructive algorithms attempt to learn a network structure, rather than relying on the trial and error approach. Constructive algorithms start with very small networks and increase their size by adding neurons and connections as needed for a particular problem. Destructive algorithms such as pruning begin with overly complex networks. Connections and neurons are then deleted to yield a minimal structure. These constructive and destructive algorithms would seem to solve the problem of finding a neural network architecture to use. However, there is a fundamental issue with these algorithms. Constructive and destructive algorithms follow strict sets of rules; for example, a constructive algorithm may only be able to add a single neuron at a time to a hidden layer. These algorithms therefore only explore a strict subset of possible architectures.
There are several drawbacks to using conventional evolutionary algorithms. Although the final overall solution may be more optimal than the solution reached by a gradient-based algorithm, evolutionary algorithms typically take longer to find a solution. Applying evolutionary algorithms to neural networks in particular comes with a variety of issues. Important factors include how to represent the networks in the population, how to measure performance and how to create offspring in a population. Evolutionary algorithms usually work with strings of real or binary numbers. There has to be a performance metric to gauge how “fit” a member of the population is. Creating offspring is usually done through mutation, crossover (recombination) or both.
Representations of a network need to maintain a link to the functionality of the network; otherwise, operations such as crossover will have no meaning. Performance is a key metric and is a problem-specific issue. For example, supervised learning problems have an associated error, which would need to be converted into an appropriate fitness function and associated value, while reinforcement learning problems have associated rewards, which would also need to be converted to an appropriate fitness function and have an associated fitness value. The mechanisms of offspring creation are usually closely related to the representation of the networks in populations.
If a network is not performing well enough using just back-propagation (i.e. the error between the expected and produced value has not lowered significantly), simulated annealing can be used. Finally, if it is still not performing well, the architecture can be mutated. Yao referenced above (and Liu) used this approach to attempt to reduce the computational cost of the evolutionary algorithm. They successfully apply their algorithm to several parity tasks. This approach is similar to the proposed hierarchical evolutionary strategy discussed above, in that different types of evolution (very short term, short term, and moderate term) are tried. In particular, the combination of a genetic algorithm at a higher level and another algorithm, such as simulated annealing, numerical optimization methods such as non-linear programming, gradient, generalized gradient, and/or Newton's method, at a lower level can be used.
Montana and Davis in “Training Feedforward Neural Networks Using Genetic Algorithms,” Machine Learning, pp. 762-767, 1989 use genetic algorithms to evolve the weights in a feed-forward neural network. They represent their networks as a list of real numbers and use mutation, crossover and gradient operators to create offspring. They successfully apply their algorithm to classification of sonar data, compare to back-propagation and incorporate domain-specific knowledge. However, their application to some real-world problems is hampered by the lack of a training algorithm for finding an optimal set of weights in a relatively short time.
D. B. Fogel et al. in “Evolving Neural Networks,” Biological Cybernetics 63, pp. 487-493, 1990, use genetic algorithms (GA) to evolve the weights in a feed-forward neural network, but also note that GAs will also work for other models, such as recurrent neural networks. They represent their networks as a list of real numbers and use only mutation to create offspring. They apply their algorithm to exclusive-or and a blending problem and compare to back-propagation, with favorable results.
Xin Yao and Yong Liu introduce an evolutionary system called EpNet for evolving the architecture and weights of feed-forward artificial neural networks in “A New Evolutionary System for Evolving Artificial Neural Networks,” IEEE Transactions on Neural Networks, 8, pp. 694-713, 1997. Yao and Liu attempt to maintain a behavioral link between parent and child by using node splitting rather than adding a fully connected node to a layer. EPNet also encourages simplicity in the network by always testing to see if a deletion will improve the network before testing an addition. They applied EPNet successfully to parity problems, medical diagnosis problems and time series prediction problems. They found that their networks generalized better than other networks developed or trained using other methods. This is one of the reasons a neuroevolution approach was selected for an embodiment of the present invention.
Yao and Liu introduce five mutation operations that, again, are chosen in succession to maintain simpler networks if possible. The five mutation operators they introduce (given in the order they are tried) are: hybrid training (train using a modified back propagation algorithm), neuron deletion, connection deletion, connection addition, and neuron addition.
Dario Floreano et al. in “Neuroevolution: from architectures to learning,” Evol. Intel. 1, pp. 47-62, 2008, apply artificial neural networks to many real-world problems ranging from pattern classification to robot control. A generic architecture shown in their FIG. 1 is similar to that depicted in FIG. 3 wherein the external environment is connected to input neurons and output units impact the external environment. They describe a continuous-time recurrent neural network or CTRNN. These CTRNN's represent a first approximation of the time-dependent processes that occur at the membrane of biological neurons.
Randall D. Beer and J. C. Gallagher in “Evolving Dynamical Neural Networks for Adaptive Behavior,” Adaptive Behavior, pp. 91-122, 1992, use evolutionary algorithms (EA) to train continuous-time recurrent neural networks (CTRNNs). They use dynamical parameter encoding to encode chromosome representing the network and use both crossover and mutation operators. They apply their CTRNNs to a food-finding task and a locomotion task (with six-legged agents).
A. P. Wieland in “Evolving Neural Network Controllers for Unstable Systems,” Neural Networks, 2, pp. 667-673, July, 1991, uses a recurrent neural network model that learns weights and connections between neurons. A binary representation is used to represent the network, and mutation, crossover, and inversion operations are used to produce offspring. This method is applied to variations on the pole balancing problem (single pole, double pole, jointed pole, and two-legged walker).
S. Dominic et al. in “Genetic Reinforcement Learning for Neural Networks,” Neural Networks, 2, pp. 71-76, 1991, compare genetic algorithms to reinforcement learning techniques. They use a feed-forward neural network, and real-valued strings are used to represent the networks. They apply their network and algorithm to the pole balancing problem and compare their results to a reinforcement learning method (Adaptive Critic Heuristic).
K. Stanley and R. Miikkulainen in “Evolving neural networks through augmenting topologies,” Evolutionary Computation, 10(2):99-127, 2002, introduce Neuroevolution of Augmenting Topologies (NEAT), which has several innovations, including speciation to protect structural innovation, global innovation numbers to do historical tracking of network structure and help avoid the competing conventions problem, and makes use of incremental growth to avoid unneeded complexity in the networks. NEAT is applied to exclusive-or and to two pole balancing problems (with and without velocities). They demonstrate that NEAT performs better than other neuroevolution methods on these tasks and demonstrate that the improvement in performance is due to those innovations.
K. Stanley, et al. in “Evolving adaptive neural networks with and without adaptive synapses,” Evolutionary Computation, 2003, CEC '03, The 2003 Congress on, 4: 2557-2564, 2003, augment NEAT by including adaption of learning rules (such as local Hebbian learning rules) for each connection as part of the evolution. This allows for adaptation of networks to changes in the environment and is related to the ability to the network to do real-time learning. They apply this version of NEAT to a dangerous foraging example.
Jeff Hawkins et al. in “Sequence memory for prediction, inference and behavior,” Phil. Trans. Royal Soc. B, pp. 1203-1209, 2009, describe a mechanism for storing sequences of patterns necessary for making predictions, recognizing time-based patterns and generating behavior. They suggest that the ability to store and recall time-based sequences is probably a key attribute, of many, if not all, cortical areas. They propose that the neocortex may be modeled as a hierarchy of memory regions, each of which learns and recalls sequences.
Artificial neural networks are known implemented in “hardware” as may be distinguished from more “software” embodiments. For example, Glackin et al. in “A Novel Approach for the Implementation of Large Scale Spiking Neural Networks on FPGA Hardware,” IWANN 2005, LNCS 3512, pp. 552-563, 2005, implemented a large scale spiking neural network on field programmable gate array (FPGA) hardware. A neuron, synapse, and spike timing dependent plasticity (STDP) blocks are implemented in FPGA logic, and neural network data are held in SRAM that is external to the FPGA device. Synapse weights are determined by spike timing dependent plasticity (STDP).
In 2007, Cassidy et al. in “FPGA Based Silicon Spiking Neural Array,” Biomedical Circuits and Systems Conference (BIOCAS 2007), pp. 75-78, IEEE, 2007, present a FPGA based array of Leaky-Integrate and Fire (LIF) artificial neurons. Their neurons and synapses were fixed, and each synapse supported a “single” event and a delay function associated with the event. The synapses were able to implement STDP.
In U.S. Pat. No. 7,533,071, entitled “Neural Modeling and Brain-based Devices Using Special Purpose Processor” and issued to Snook on May 12, 2009, discloses a further FPGA hardware embodiment. Snook uses a special purpose processor and FPGAs to model a large number of neural elements. Each core of the FPGA could do presynaptic, postsynaptic, and plasticity calculations in parallel. It could also implement multiple neural elements of the neural model. The network was used to control a robot.
Sharp et al. in “Power-efficient simulation of detailed cortical microcircuits on SpiNNaker,” Journal of Neuroscience Methods, 201, pp. 110-118, 2012 simulate an anatomically-inspired cortical microcircuit of ten thousand neurons and four million synapses using four SpiNNaker chips and less than two watts. The neuron model was very basic but consumed little power. Each chip consisted of 18 homogeneous processors.
It is known to utilize or implement central pattern generators with artificial neural networks. M. Anthony Lewis et al. in “Control of a robot leg with an adaptive a(nalog)VLSI CPG chip,” Neurocomputing, 38-40, 2001, pp. 1409-1421 constructed an adaptive central pattern generator (CPG) in an analog VLSI chip, and uses the chip to control a running robot leg. A pacemaker neuron is used to control the firing of two motor neurons. Sensors are excited and inhibited the pacemaker, allowing the robot to adapt to changing conditions.
Thereafter, M. Anthony Lewis et al. in “CPG Design Using Inhibitory Networks,” Proc. of the 2005 IEEE International Conference on Robotics and Automation, (ICRA 2005), pp. 3682-3687, 2005, implemented CPGs that are designed and optimized manually. A four-neuron, mutual inhibitory network forms the basic coordinating pattern for locomotion. This network then inhibited an eight-neuron network used to drive patterned movement.
It is also known to utilize analog circuitry for the construction of artificial neural networks. Simon Friedmann et al. in “Reward-based learning under hardware constraints—using a RISC processor embedded in a neuromorphic substrate,” Frontiers in Neuroscience, 7, p. 160, 2013 proposed and analyzed in simulations a flexible method of implementing spike time dependent plasticity (STDP) in a single layer network on a wafer-scale, accelerated neuromorphic hardware system. Flexibility was achieved by embedding a general-purpose processor dedicated to plasticity into the wafer. It was possible to flexibly switch between synaptic learning rules or use different ones in parallel for different synapses.
U.S. Pat. No. 8,311,965 entitled “Area Efficient Neuromorphic Circuits Using Field Effect Transistors and Variable Resistance Material” issued to Breitwisch et al., Nov. 13, 2012, provides details for analog neuromorphic circuits using field effect transistors. Manually programmable resistances are implemented using phase change material.
U.S. Published Patent App. No. 2012/0109863 entitled “Canonical Spiking Neuron Network for Spatiotemporal Associative Memory,” on May 3, 2012, to Esser et al. presents a layered neural net of electronic neurons configured to detect the presence of a spatiotemporal pattern in a real-time data stream, and extract the spatiotemporal pattern. The plurality of electronic neurons stored the spatiotemporal pattern using learning rules (STDP). Upon being presented with a version of the spatiotemporal pattern, they retrieved the stored spatiotemporal pattern.
U.S. Pat. No. 8,600,919 entitled “Circuits and Methods Representative of Spike Timing Dependent Plasticity of Neurons,” to Poon et al., Dec. 3, 2012, describes a circuit and a method that could emulate STDP in a way that closely replicated biochemical processes, that could emulate all of the different types of STDP, and that could provide a relationship between the Bienenstock-Cooper-Munro rule and STDP.
U.S. Published Patent App. 2009/0292661 entitled “Compact Circuits and Adaptation Techniques for Implementing Adaptive Neurons and Synapses with Spike Timing Dependent Plasticity (STDP)” on Nov. 26, 2009, to Hass implements STDP using a simple analog circuit.
U.S. Pat. No. 8,510,239 entitled “Compact Cognitive Synaptic Computing Circuits with Crossbar Arrays Spatially in a Staggered Pattern” issued to Dharmendra S. Modha, Aug. 13, 2013, implements STDP using electronic neurons interconnected in a compact crossbar array network. Neurons could be implemented to include a “leak” function. The invention could be realized in an entirely hardware form, an entirely software form, or a hybrid software/hardware form.
U. S. Published Patent Application No. 2012/0036099 entitled “Methods and Systems for Reward-Modulated Spike-Timing-Dependent Plasticity” on Feb. 9, 2012, to Venkatraman et al. describes an area-efficient implementation of reward-modulated STDP. Three separate memories with entries for each synapse were used. The first two memories stored current and updated synapse weights, and the third was used to determine if the weight needed to be updated.
U.S. Pat. No. 8,433,665 entitled “Methods and Systems for Three-Memristor Synapse with STDP and Dopamine Signaling” issued to Tang et al., Apr. 30, 2013, proposes implementation of a three-memristor synapse where an adjustment of synaptic strength is based on Spike-Timing-Dependent Plasticity (STDP) with dopamine signaling. One memristor could be utilized for long-term potentiation (LTP), another for long-term depression (LTD), and the third as a synaptic connection between a pair of neurons with a variable strength.
U.S. Pat. No. 8,515,885 entitled “Neuromorphic and Synaptronic Spiking Neural Network with Synaptic Weights Learned Using Simulation” issued to Modha, Aug. 20, 2013, used computer simulation to determine synaptic weights which were loaded onto chips. Simulation was abstract and could be done using spike-timing dependent plasticity (STDP) or reinforcement learning. External learning allowed for small, efficient neuromorphic hardware systems.
U. S. Published Patent App. No. 2013/0073497 entitled “Neuromorphic Event-Driven Neural Computer Architecture in a Scalable Neural Network” on Mar. 21, 2013, to Filipp Akopyan et al. presents a spike event driven network where axons are connected to neurons by a synapse array. It uses a scheduler to deliver spike events to axons. Each neuron maintains a STDP variable that encodes the time of the most recent fire. It is used to implement LTP/LTD.
B. V. Benjamin et al. in “Neurogrid: A mixed-analog-digital multichip system for large-scale neural simulations.” Proceedings of the IEEE, 102, pp. 699-716, 2014 created Neurogrid, an entirely clockless system with sixteen mixed-analog-digital chips that simulated a million neurons with billions of synaptic connections in real time using sixteen Neurocores integrated on a board that consumed three watts. STDP was possible, but at a high cost to area, time, and energy efficiency.
Giacomo Indiveri et al. in “Neuromorphic silicon neuron circuits.” Frontiers in Neuroscience, 5, 2011 described “the most common building blocks and techniques used to implement” silicon neuron circuits and “compare[d] the different design methodologies used for each silicon neuron design described, and demonstrate[d] their features with experimental results, measured from a wide range of fabricated VLSI chips.”
Cassidy et al. in “Cognitive Computing Building Block: A Versatile and Efficient Digital Neuron Model for Neurosynaptic Cores,” IBM Research, 2013, presented TrueNorth, a scalable neurosynaptic computer architecture, which used leaky integrate-and-fire neurons. The input, the state, and the output were implemented with configurable and reproducible stochasticity. The invention has four leak modes that bias the internal state dynamics, deterministic and stochastic thresholds, and six reset modes for rich finite-state behavior.
Preiss et al. in “Compass: A scalable simulator for an architecture for cognitive computing,” Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 54. IEEE Computer Society Press, 2012 presented Compass, a multi-threaded, parallel functional simulator of the TrueNorth architecture. It successfully simulates 109 neurons and 1012 synapses at 388 times slower than real time. It is event driven, not clock driven.
WO Patent App. 2004/027704 published Apr. 1, 2004, entitled “Spiking Neural Network Device,” by Dario claims a device that stores a genotypic representation of a spiking neural network. Evolutionary algorithms are used to tailor networks to be used in control systems.
Gomez et. al. in “Efficient Non-linear Control Through Neuroevolution,” Machine Learning: ECML 2006, LNCS 4212, pp. 654-662, 2006, introduce CoSyNE, a neuroevolution method that evolves recurrent neural networks at the weight-level. Networks are represented as a vector of real-valued weights, children networks are created using crossover and mutation, and networks are co-evolved by permuting subpopulations to allow for an increase in diversity. CoSyNE is compared with a large number of reinforcement learning and neuroevolution methods on the one and two pole balancing task. In their follow-up “Accelerated Neural Evolution through Cooperatively Coevolved Synapses,” J. Mach. Learn. Res., 9: pp. 937-965, 2008, Gomez et al. discuss CoSyNE in detail, as well as compare it with several reinforcement learning and neuroevolution methods. This work presents results for sixteen methods in total (including CoSyNE) on one pole and two pole balancing tasks, with and without velocities provided as input. The results demonstrated that neuroevolution methods perform better than reinforcement learning methods, and that CoSyNE performed the best of the neuroevolution methods tested.
Notwithstanding the advances in evolutionary artificial network architectures and algorithms, there remains a need for an improved neuroscience-inspired network architecture which overcomes the problems exhibited by known architectures.