1. Reference to Computer Program
Appendix I of this application contains two attached files with program listings written in the C programming language, which implement a first embodiment (Embod1_c.txt) and a second embodiment (Embod2_c.txt) as described in this specification.
2. Field
The disclosure relates to the field of artificial neural networks, specifically to an improved neuron component for such networks. Such components provide long- and short-term learning capabilities suitable for pattern recognition, continuously adapting automation tasks, and associative memory tasks.
3. Prior-Art
Artificial neural networks (ANNs), and the neurons that comprise them are electronic circuits or software circuit simulations that roughly attempt to mimic the functions of neurons in the biological fields, sometimes referred to as biological brain cells, and interconnected collections of such biological brain cells. They have considerable history in fields relating to automation, pattern recognition, artificial intelligence, and computation. Today they can be found in wide use.
In general, ANNs are systems in which some number of neurons, or neuron units (artificial brain cell simulations) are connected together. Input signals from outside of the ANN may also be connected to the inputs of some of the neurons making up the ANN. Also, some of the outputs of neurons within the ANN will generally serve as the desired output signals produced by the ANN. ANNs differ from conventional logic circuits in that they respond to input signals (input to output mappings) by altering the relative strengths of the connections between neurons. Logic circuits, on the other hand, map input signals to outputs based on static logic functions. In an ANN, the strength of each connection made to a neuron can be altered by a learning algorithm to produce a more desired response to a given set of inputs. A logic circuit will produce an output that is the equivalent of performing a Boolean function of its input signals. An AND gate, for example, will produce an output that represents the result of performing a Boolean AND operation on its inputs, i.e., a one output only when both of two inputs are ones.
Neurons have input synapses which are roughly analogous to synapses in biological brain cells. A synapse is an input connection from the neuron to an axon (a signal-carrying nerve or means) that possesses elasticity. That is, it is a connection to the neuron in which the strength of the connection may be altered. Artificial neurons contain connection weights within their input synapses whose values can be changed to alter the connection strength to the synapse. Such input synapses effectively modulate the strength of connections between neurons or between neurons and external signal sources. Neurons and networks of neurons can be “trained” to respond correctly to outside signals by changing the values of such connection weights. This effectively changes the strengths of those connections, which alters how the neuron's output responds to a given set of inputs.
Neural networks consisting of neurons are typically trained ahead of time with a representative set of mappings between a set of expected inputs, and the desired outputs the network should produce in response to each of those inputs. The neural network, once trained, is able to provide responses to novel input patterns that were not known at the time it was trained. It possesses this ability because it is able to generate a new output from the patterns in the original representative training set. This leads to one of the primary advantages of neural networks, which is that they are particularly suited for situations where the exact inputs that will be present when the network runs cannot be known ahead of time, while it is being trained.
Neuron Structure—FIG. 1
The most important element within a conventional neural network is a neuron (sometimes called a processing unit, or neurode, to distinguish it from its biological counterpart). Multiple input values are conveyed to the neuron via its synapses. Its single output, in turn, conveys the result of the neuron's processing of inputs. Its output is sometimes referred to as its axon. The signals or values connected to a neuron's inputs can be the outputs of other neurons, or they can originate from external sources, such as sensors, or databases. In digital neuron systems, signals that are used as inputs and outputs are represented numerically. They are usually a positive number. In floating point representations (number of decimal places not fixed) the number is usually between 0.0 and 1.0 (inclusive), e.g., 0.1374. Other representations are possible, such as integer values or voltage levels. The values that are supplied to a neuron's input will sometimes be referred to here as axon levels (AL), because they are the value levels which the neurons permit to be conveyed on input and output axons.
Neurons also use connection weights, most often simply referred to as weights here and in the prior-art. Such weights are used to modulate, or gate, the values on their inputs. In floating-point representation, the value on a given input is gated or modulated by the weight value by simply multiplying the input value by the weight value. The term ‘gate’ is sometimes used here as a more general term for a modulation effect that is preformed with a simple multiplication in floating-point arithmetic. The results of modulating each input value by the weight value in all the synapses in the neuron are then summed to produce a preliminary output or internal sum of all the weighted inputs for the neuron. The preliminary sum is further passed through a transfer function in order to limit it to a predetermined range (usually 0.0 to 1.0) permitted for axon levels. The result is then made available on the output of the neuron. It can be connected to the inputs of other neurons, or used by external processes, such as motors, display indicators, or databases.
FIG. 1 shows a schematic depiction of a typical prior-art neuron along with some of its primary components. The components making up the neuron are the neuron's body 100, an output axon 102, input synapses or connections 104, 104b, . . . 104n containing weights 106, 106b, . . . 106n holding weight values (labeled W1 through Wn in FIG. 1). The neuron body calculates an output value (X), which is conveyed on its axon 102. The neuron's axon can be used to connect the neuron's output value to other neurons in a neural network, or to external processes. Other axons can be connected to the neuron's input synapses 104, 104b, . . . 104n. Axons 108, 108b, . . . 108n connected to the neuron via its synapses can originate at other neurons in a neural network, or from external processes. In some neural networks, they can even be fed back from the their own neuron's output axon 102. In many of the most popular neural network configurations however, no feedback is used. There is no typical number of inputs to a neuron. Depending on the application, a neuron may have as few as one input, a few, thousands, millions, or even more.
Each weight value 106, 106b, . . . 106n is depicted as a box at the synapse between the incoming axon and the neuron body.
Modulating Input Value by Weight Value—FIG. 1
When processing and propagating input signals (signal propagation phase), the values supplied to the neuron's synapses 104, 104b . . . 104n are each modulated by the synapses' respective weight values 106, 106b . . . 106n. The effect of this process is to pass, or gate, a portion of the input value through the synapse, which is proportional to the value of the weight. In this way, the weight value modulates the connection strength of the synapse. The result is then summed with the other similarly processed input values 110. Using conventional floating point math, the modulating function is preformed by multiplying the signal value by the weight value. It is expressed simply as:ri=AiWi 
In this formula, for each of the neuron's synapses ‘i’, ‘ri’ is the result of that synapse's modulating function, “Ai” is an axon level (AL), which is the value carried by the axon that is connected to the synapse, and “Wi” is the weight value for modulating the input signal at the synapse. In typical neural network configurations, the weight may be negative or positive (often from −1.0 to +1.0). A negative weight value produces a negative result (ri), which will reduce the sum, thereby acting as an inhibitory synapse. A positive weight value will produce a positive result (ri), which will contribute to an increase in the sum of all the results, thereby acting as an excitatory synapse.
Weights are usually adjusted in a separately preformed training procedure to bring the outputs produced in the signal propagation phase closer to a desired response for a given set of input signals. During the learning phase, the neuron is trained using an external learning algorithm 114. A set of input patterns is presented to the neuron or neural network being trained. For neurons at the output of the network, the external learning algorithm produces an error value 116 by comparing the neuron's output to predetermined set of desired outputs (responses) for the pattern. The error value represents the difference between the desired output and the output produced by the neuron. The error value is used to train the weights within the neuron so that the next time the same input pattern is presented to it, the response will be a little closer to the desired response.
Output Functions
Functionally, a typical neuron can be described at a high level of abstraction as a device that accepts multiple input values and processes them to produce a single representative output value. Generally, the output value produced is the sum of all the neuron's inputs, after they have each been multiplied by their respective synapses' weight values. The neuron's output value is then made available for connection to other neurons or processes through its output axon.
The value carried on an axon is sometimes referred to here as an axon level. As mentioned, the single output value produced by a neuron is a weighted sum representation of the values that are connected to the neuron's input synapses through other axons. As the values connected to its inputs change, so will the neuron's single representative output.
At a more practical level, the internally produced sum of a neuron's multiple weighted inputs will be restricted before being output on its axon. Typically, axon levels will be restricted to positive values between 0.0 and 1.0. Floating-point arithmetic is typically used, though other representations, such as percentages, or integer representations are also acceptable. The process of restricting the internal sum of a neuron's weighted inputs is often referred to as a squashing function. It is used to maintain the values produced by neurons to a reasonable range. The neuron's output value (its axon level) can be connected to other neurons where it may then be summed together with other axon levels. These sums can become infinitely large if left to propagate unchecked. It is essential therefore, that the level at the output of each neuron be restricted, limited, or clipped in some way so that it remains in a workable range.
There are a variety of squashing functions that can be used to limit or clip the neuron's output level. Simply clipping the weighted sum of the input values to maximum and minimum values, for example a range of 0.0 to 1.0, is one of the simplest methods. Here, any sums of weighted inputs that exceed 1.0 will be made 1.0, and any sums of weighted inputs that fall below 0.0 will be made 0.0 on the neuron's axon.
This simple clipping technique will work well as long as the levels produced by summing the weighted inputs stay below the level where they will be clipped. Once the internal sum exceeds the clipping maximum, differences in the input signals will not be reflected as a difference on the neuron's output signal. That is, the output will be identical for all input values that cause the weighted sums to exceed the maximum axon level value. Since most weight-training algorithms assume and require that differences in inputs will be represented as differences at the neuron's output, this situation should be avoided.
Most prior-art neural network methods use a sigmoid squashing function as their transfer function. A sigmoid squashing function causes the output value to increase more slowly as its input approaches the maximum allowable level. As the maximum is approached, large increases in the internal sum will produce successively smaller increases in the resulting output value. Near the maximum side of the allowable range, this insures that different input values will be represented as different output values on the neuron's axon (though the differences will be much smaller). Its advantage over simpler schemes is that it provides at least a small amount of representative change in its output as long as the variable has enough resolution to represent it.
The sigmoid squashing function also has benefits for internal sum values near the minimum AL value, which is usually 0.0. In this case, relatively large changes at lower values will produce smaller changes in the output. This may be a benefit in some prior-art designs. On the other hand, it may be desirable to have large changes at lower values to help effect early learning. For this reason, prior-art neurons may sometimes bias the sigmoid function in order to speed learning at lower output levels.
The sigmoid function is computationally intensive, so simpler schemes, such as approximations based on a table lookup are sometimes used. This is especially true in applications where the computational costs of the sigmoid function will tend to outweigh its benefits.
Two Main Phases or Modes of Operation
As discussed above, there are generally two main phases, or modes of functional operation for a neuron. These are a signal propagation mode and a weight adjustment mode. In the signal propagation mode, input stimuli, sometimes called signals or axon levels, are supplied to the neuron, and are processed to produce the single output signal for the neuron. This mode of operation is sometimes referred to as the execution phase or run-time mode of a neural network. The other general operational mode of a neural network is the learning mode, which is sometimes called the weight-training, or weight-adjusting mode. Usually, a neural network is fully trained initially to perform some task, and is then placed into service in its signal propagation mode and no further training commences.
Learning Algorithms
A neuron will map a set of input stimulus or signals to a desired set of output responses for any given set of input signals. A neuron “learns” to respond correctly to a given set of input values by having its weight values adjusted, or trained, by a learning algorithm (114 in FIG. 1). When a neuron or neural network is having its weights adjusted by a learning algorithm, it is said to be in learning mode, or weight-training mode. A learning algorithm is sometimes referred to as a weight training algorithm or just a training algorithm because it is the set of functional methods that are used to “train” weights in the neurons of a neural network.
During this process, the weight values are adjusted higher or lower to bring the neuron's output value X closer to a desired output. The output is predetermined for the specific set of values that are present on the neuron's input synapses. The first step is to produce an error term δ 116 for the neuron i, from which proportional weight changes at the neuron's connection synapses can be calculated. For a neuron i that is directly connected to the output of the network, the error term is simply the difference between the output produced by the neuron Xiactual, and the output we desire Xdesired. It is expressed as:δi=Xidesired−Xiactual 
The error term δ 116 for the neuron is then used to adjust each individual weight value in the neuron in an effort to move the neuron's output closer to its ideal value. How these error terms are applied to adjustments of individual weight values will be discussed in more detail below.
Neural Networks with Hidden Layers
The method of training a neuron in a neural network that has been described above breaks down when there is no direct output connection. That is, neurons in a neural network that connect to other neurons only may contribute to the output, but in ways that are difficult to compute. Such neurons are called hidden neurons because their outputs are hidden “behind” other neurons. Because they are usually configured in networks that use no feedback, they are almost always part of an entire layer of neurons that are hidden. For this reason, related groups of hidden neurons are generally referred to as hidden layers, or hidden slabs.
Back Propagation
Networks that do not permit a neuron's output signal to feed back to any previous or upstream neurons feeding the instant neuron are called feed-forward networks. The distinction is made in the prior art primarily because a family of gradient descent learning algorithms have been developed for feed-forward networks, which propagate error values back to hidden neurons. These algorithms are called back-propagation learning algorithms. The feed-forward neural networks they run on are often classified as back-propagation neural networks. While there are other types of networks, back-propagation networks have experienced considerable success. They have been widely used and are generally well-known.
Back propagation uses a special set of calculations to produce error values 116 for hidden layer neurons. The expression for calculating the error value at a given hidden neuron i may be expressed in terms of the error values that have been calculated at the subsequent (post-synaptic) neurons j to which neuron i is connected, along with the weight value W between the two neurons. The calculation is expressed as:
      δ    i    =            [                        ∑          j                ⁢                                  ⁢                              δ            j                    ⁢                      W            ij                              ]        ⁢          X      i      
Note that the output of the neuron for which the error value is being calculated Xi is used in the calculation as well. Here, it represents the result of the output transfer function. It can be seen that the same error value must have been calculated for the neuron of each forward connection δj prior to producing this neuron's error value. This is what restricts back propagation to feed-forward-only networks.
The error value δi calculated for neuron i in the above formula is then incorporated in making the individual weight adjustment calculations. There are a variety of ways the calculated neuron error values are used to adjust the neuron's weights. One example is given by the equation:Wij=Wij+ηδjAi 
Here, i represents the pre-synaptic neuron or process that is connected to the neuron j whose weights are currently being adjusted. In this calculation Wi, is the weight value between i and j to be adjusted by the calculation. The weight is adjusted by the neuron's error value δi, which is further modulated by the learning rate η determined as part of the learning algorithm. The learning rate is generally used to slow the rate at which the weights are altered so as to reduce the amount each weight adjustment will corrupt weight values that have already been trained on previously learned patterns.
Finally, the individual weight adjustment value is also proportional to the output value produced by the pre-synaptic neuron or process modulated by this weight Ai (i.e., the neuron connecting to the target neuron). If the value on the output of the pre-synaptic neuron is small, the weight adjustment will be small in the target neuron. This particular weight-adjustment method, based on these three factors, is sometimes referred to as the GDR, or generalized delta rule.
Weights Encode Mappings Between Inputs and Desired Responses
Any given weight in a neuron or in a neural network can, and likely will, contribute to a multitude of different trained responses to different input combinations. This characteristic of neural networks is both a strength and a weakness. It is a strength because it allows the neural network to generalize and apply lessons learned, when responding to one set of inputs, to a new set of similar inputs. That is, if the network has learned a desired output for one set of inputs, and it is then presented with a set of inputs that are almost, but not quite, identical, it will produce an output that is conceptually similar to the output it learned in response to the training set.
A neuron's use of the same set of weight values to encode multiple responses doesn't necessarily eliminate its ability to discern two very similar input vectors that require very different responses. It does make such discernment difficult though, requiring more and more resolution, or bit width from the variables used to hold weight values. In this case is the small number of inputs that differ between the two sets of inputs will be responsible for all the difference in the output value. In other words, the weights for the inputs that aren't common to both sets of inputs will be adjusted deeply to compensate for the values produced by the weight calculations preformed on the common inputs. From this it can be seen that the ability of a neural network to discern between similar input sets is directly related to the dynamic range of the weight values.
How Neurons are Used
As stated, a neuron is the primary component in a neural network. To preform a useful task, a typical neural network may be composed of tens of thousands, or even millions of individual neurons connected together in a variety of ways and trained. Information is represented in a neural network according to the strengths of the connections between the individual neurons comprising the network. Connection strengths between neurons are represented by the weight values at each neuron's input synapses. Information is represented in these connection strengths between each neuron in a highly distributed way across all, or at least many, of the neurons and connections making up the neural network.
One typical example application of neural networks is in optical character recognition. In such a neural network printed characters are scanned in and represented as an array of pixel values to be used as inputs. Internally after training this neural network, connection strengths may end up adjusted such that one or more hidden neurons are activated whenever a set of pixels having a single horizontal line are present (such as for ‘A’, ‘H’, ‘L’, or ‘T’). Another internal neuron may end up representing letters in which two lines are connected by a single horizontal line (‘A’ or ‘H’). Yet another hidden neuron may end up representing letters with vertical slanted lines that come together with other lines (‘A’, ‘K’, ‘M’, ‘N’, ‘V’, ‘W’, ‘X’, ‘Y’, or ‘Z’). On the other hand, the trained network may end up representing the various attributes and features found in the input pixels completely differently.
How the neural network ends up representing the various letters within its connection weights between neurons may not be specified or known by those designing and training the network. In many instances, only representations of the trained input patterns to the network, and the desired output from the network for each of those input patterns, are known and presented by the trainer. How the network produces those desired responses, and how that information is represented internally by the connections within the neural network, is a product of many factors. Such factors include the initial neural network structure, the informational structure of the training data, the initial weight values, the training algorithm and learning rate used, small random changes made to weights as the network is trained, the order that the training set is presented to the network, and any imperfections in the training set that may be presented during training
Some Prior-Art Neurons Employ Multiple Weights for Each Connection
Some prior-art neurons employ multiple weights for each connection. Some use two weights to allow a decision as to which is the better trained weight. This is primarily a technique for avoiding local minima, a problem encountered in some weight adjustment methods, such as back-propagation. It can also be used to help determine when the network is optimally trained, so that the trainer will know the best time to terminate training. In prior-art neurons, everything learned is generally permanent, so knowing when to stop training is a real concern. An example of this type of usage can be found in U.S. Pat. No. 4,918,618 to Tomlinson, Jr., issued 11 Apr. 1990.
Another prior-art device that stores multiple weights for each connection can be found in U.S. Pat. No. 5,671,337 to Yoshihara, issued 23 Sep. 1997. Yoshihara's input signals are modulated by multiplying them by a single weight value when the neuron is in signal propagation phase, just as in more conventional neurons. The single representative weight used for modulating the input value is either selected, or derived, from the set of weights stored for each connection. It is then used conventionally as the single weight value by which a given input signal is multiplied for propagation (see the discussion on weights above). In some embodiments, the single weight is selected from the set of weights for the connection synapse, based on the magnitude of the signal at the input for which the weight is being selected. Other embodiments use the set of weights as parameters for a stored function, which itself produces a single weight value. In any case, the single weight value produced is then multiplied by the input signal in conventional fashion. It is also important to note that learning is performed with any number of conventional learning algorithms which, once chosen, are applied to all the weights without differences in learning rates or algorithm. Back propagation is the exemplary method within these embodiments. Importantly, in every embodiment, all weights are adjusted by the same learning algorithm and learning rate.
Separate Sub-Networks have Provided Responses for Different Short-Term Adaptation Needs.
An interesting technique for accommodating some of the problems prior-art ANNs encounter in dealing with minute-by-minute details is presented in a paper by Charles Hand to NASA's Jet Propulsion Laboratory, Hand, Improved Autoassociative Neural Networks, JPL New Technology Report NPO-21224, October 2003, Jet Propulsion Laboratory, Pasadena, Calif. A hexapod robot is shown to have a need to be trained with different walking gaits depending upon its circumstances. This was done using a dynamically selectable sub-network, which was selected based on which walking gait was required of the hexapod robot at the moment. The sub-networks, were built with binary (1-bit) weights that made them simple. More environmental moment-by-moment walking gait details could be stored in Hand's network by adding and training more sub-networks for each new walking gait that might be needed. Hand demonstrates a smart technique to work around a shortcoming of current neural network technology. Hand also helps to demonstrate a need for a neural network employing a neuron which is able to overcome this shortcoming by specifying a means of storing long-term general information separately from short-term, moment-by-moment response adaptations.
Learning is Usually Permanent in Prior-Art Neural Networks
Most current neurons used in artificial neural networks have no means for explicitly forgetting anything. For this reason they must have noise-free training data or the weight memory will become corrupted with the accumulated effects of learned noise that will never be forgotten. Also, current neurons can not continue to learn once trained, because new lessons will increasingly interfere with and corrupt previously learned information. For these reasons, a typical neural network is trained on a given set of pristine, representative input patterns, and is then put in service. Training does not normally continue once the initial set of training patterns have been presented and learned.
Learning Must Commence at Slow Pace in Prior-Art Neurons
To train existing neural networks, sets of signals representing desired exemplary input patterns, are usually successively applied to the primary inputs and allowed to propagate through the neural network to the output, this has been referred to here as the signal propagation phase. The differences between the actual and desired output values, determined by an external teacher, are then calculated to arrive at an error signal for each neuron. The calculated error is then used to adjust the neuron's synapse weights.
The process of presenting exemplary patterns and training toward a desired output is repeated in a recurring manner and typically requires a large number of iterations to reduce errors appearing at the primary outputs to an acceptable level.
Adjustments need to be made slowly because, as input patterns are trained, every weight adjustment will adversely affect the weight adjustments performed previously for all other patterns. The weights contain both the detailed information for dictating the neuron's moment-by-moment responses within the same set of weights where generalized response information is stored. Thus they will tend to interfere with each other as the set of training patterns is repeatedly presented. If the weights are adjusted too much on one training pattern, the changes to weights caused by the current training pattern will completely eliminate all the prior weight adjustments performed for one or more previously adjusted sets of inputs and outputs.
Continuous Adaptation and Learning is Very Difficult to Achieve Using Prior-Art Neurons.
Because the weight training is usually permanent, everything neurons learn while in training mode must remain in them indefinitely. Both long-term generalized knowledge, as well as short-term, moment-by-moment specifics for a given task, are all stored together. Even if weights with huge dynamic resolution are used, attempting to keep small moment-by-moment details in the same conceptual memory space as the more generalized information will eventually lead to loss of information. The generalized information will be lost to the details of the moment. Also, the ability to learn new details will be adversely affected by old details that are no longer needed, even if those details occurred many days, weeks, or even years earlier.
Ability to Train Neurons Based on Reward-Punishment Schemes is Poorly Supported
Most neural network learning algorithms in use today don't permit simple reward-punishment learning schemes to be used. Much work has been done to find ways to train neurons based on reward-and-punishment cues for a variety of reasons. One advantage is that such a training scheme would mimic much of what is known about how autonomous learning occurs in biological organisms, Levitan, Kaczmarek, “The Neuron, Cell And Molecular Biology”, 2002, Oxford University Press, ISBN: 0-19-514523-2. Learning based on noisy learning signals, such as reward-and-punishment cues from the environment, will also help greatly in producing systems that continuously adapt to their surroundings.
Weight Adjustments to Adapt Short-Term Responses could not be Explicitly Performed without Affecting Existing Learning
As stated, one of the primary disadvantages of conventional neural networks is that they usually must be taught to handle all possible situations ahead of time. That is, the effect of training is to adjust weights to values that are best for most patterns that may be encountered by the network. At the end of the training process, all weights are fixed to reflect the general desired responses to the set of training patterns.
On one hand, the weights must be trained in enough detail so that a detailed and correct response can be made to any novel (unplanned) sets of stimuli encountered by the running neural network. However, the weights must not be trained so specifically that needed general information is lost. In other words, training the same set of weights to respond specifically necessarily corrupts the neuron's ability to respond acceptably to many different broader classes.
Hypothetical Car-Driving ANN Example
Consider the construction an artificial neural network (ANN) for driving a car as a hypothetical example of an application of neural networks. This is an important application where neural networks may eventually be able to help. A variety of methods of automatically and autonomously driving vehicles are beginning to be explored at this time, “Autonomous Ground Vehicle Grand Challenges”, 2004-2006, DARPA—The Defense Advanced Research Projects Agency, http://www.darpa.mil/grandchallenge/overview.asp. Attempts to use neural networks in these endeavors has met with only limited success.
The hypothetical application of a neural network to this application will demonstrate some of the problems with current neural networks constructed with current neurons. Current neuron models allow ANN designers to produce a neural network that can be trained to drive a car in general, placing that general knowledge in long-term (actually permanent) memory. Such generalized, long-term driving lessons might include all the basics, such as steering, breaking, acceleration (gas), clutch and gear-shifting, among other general driving knowledge.
If, after having trained such an ANN in the basics, the ANN and its car are placed into anything other than a very generic driving situation, it will not work. Whether it's for driving a car, or for any other context, this inability to change once trained, in order to adapt to new moment-by-moment situations, is an inherent limitation of current ANN technology.
In this hypothetical car driving example, new moment-by-moment adaptation needs might include city driving, highway driving, off-road mountain terrain, off-road beach terrain, and finally, night-driving for each of these previously experienced and learned situations, respectively. Here the general learning is stored in weights that are trained slowly but hold their learned information for very long periods. That is, general driving instructions are maintained as long-term memories. One problem with present-day neurons is that the short-term, detailed responses needed for driving variations must be permanently represented in the same set of weights used to store the general driving information. One notable disadvantage of this strategy is that the weights holding long-term responses become corrupted with values used to produce responses to short-term details.
Thus it can be seen that the structure of long- and short-term learning will often be very different. By attempting to maintain both of these types of learning within a single set of connection weights prior art neurons will have great difficulty learning short-term, detailed responses without adversely affecting the long-term learning that is represented in the same set of connection weights. The usual solution is to simply forbid continuous learning, shutting off learning once a given set of responses have been learned. In these cases, both short- and long-term responses will all be represented within the single permanent set of weights, requiring weight variables with considerable resolution and eliminating any ability of the trained neural network to adapt to new short-term details that weren't included within the original training.