1. Field
The disclosure relates to the field of artificial neural networks. Specifically it relates to an improved learning method capable of producing neuron-level error values in non-output neurons. This permits networks to be constructed with arbitrarily complex signal feedback.
2. Prior-Art
Artificial neural networks (ANNs), are electronic circuits or software circuit simulations composed of some number of artificial neurons connected together. ANNs roughly attempt to mimic the functions of networks of biological neurons (sometimes referred to as biological brain cells), and interconnected collections of such biological brain cells. ANNs have considerable history in fields relating to automation, pattern recognition, artificial intelligence, and computation. Today they can be found in wide use in these and other fields.
In general, ANNs are systems in which some number of neurons, or neuron units (artificial brain cell simulations) are connected together. Input signals from outside of the ANN may also be connected to the inputs of some of the neurons making up the ANN. Also, some of the outputs of neurons within the ANN will generally serve as the desired output signals produced by the ANN.
ANNs differ from conventional logic circuits in that logic circuits respond with ONES or ZEROS based on static mappings between inputs and outputs. ANNs, on the other hand, can have multi-valued outputs, and change how they respond to input signals (input-to-output mappings) by altering the relative strengths of the connections between neurons. Logic circuits map (or transform) input signals to outputs based on static logic functions. In an ANN, the strength of each connection made to a neuron can be altered by a learning algorithm to produce a more desired response to a given set of inputs. A logic circuit will produce an output that is the equivalent of performing a Boolean function of its input signals. An AND gate, for example, will produce an output that represents the result of performing a Boolean AND operation on its inputs, e.g., a ONE output only when both of two inputs are ONES.
Neurons have input synapses which are roughly analogous to synapses in biological brain cells. A synapse is an input connection between the neuron and an axon (a signal-carrying nerve or means) that possesses elasticity. That is, it is a connection to the neuron in which the strength of the connection may be altered based on experiences. Artificial neurons contain connection weights within their input synapses whose values can be changed to alter the connection strength to the synapse. Such input synapses effectively modulate the strength of connections between neurons or between neurons and external signal sources. Neurons and networks of neurons can be “trained” to respond correctly to outside signals by changing the values of such connection weights. This effectively changes the strengths of those connections, which alters how a given neuron's output responds to a given set of inputs.
Neural networks consisting of neurons are typically trained ahead of time using a representative set of mappings between a set of expected inputs, and the desired outputs the network should produce in response to each of those inputs. The neural network, once trained, is able to provide responses to novel input patterns that were not known at the time it was trained. It possesses this ability because it is able to generate a new output from the patterns in the original representative training set. This leads to one of the primary advantages of neural networks. I.e., they are particularly suited for situations where the exact inputs that will be present when the network runs cannot be known ahead of time, while it is being trained.
Neuron Structure—FIG. 1
The most important element within a conventional neural network is a neuron (sometimes called a processing unit, or neurode, to distinguish it from its biological counterpart). Multiple input values are conveyed to the neuron via its synapses. Its single output or output means, in turn, conveys the result of the neuron's processing of inputs. Its output is sometimes referred to as its axon. The signals or values connected to a neuron's inputs can be the outputs of other neurons, or they can originate from external sources, such as sensors, or databases. In digital neuron systems, signals that are used as inputs and outputs are represented numerically. They are usually a positive number. In floating point representations (number of decimal places not fixed) the number is usually between 0.0 and 1.0 (inclusive), e.g., 0.1374. Other representations are possible, such as integer values or (in the case of electronic circuits) voltage levels. The values that are supplied to a neuron's input will sometimes be referred to here as axon levels (AL), because they are the value levels which the neurons permit to be conveyed on input and output axons.
Neurons also use connection weights, most often simply referred to as weights here and in the prior art. The weights are used to modulate, or gate, the values connected to each neuron's inputs. In floating-point representation, the value on a given input is gated or modulated by the weight value by simply multiplying the input value by the weight value. The term ‘gate’ is sometimes used here as a more general term for a modulation effect that is performed with a simple multiplication in floating-point arithmetic. The results of modulating each input value by the weight value in all the synapses in the neuron are then summed to produce a preliminary output or internal sum of all the weighted inputs for the neuron. This preliminary sum (sometimes called an internal sum) is further passed through a transfer function in order to limit the final result of the neuron processes to a predetermined range (usually 0.0 to 1.0) permitted for axon levels. The result is then made available on the output of the neuron. It can be connected to the inputs of other neurons, or used by external processes, such as motors, display indicators, or databases.
FIG. 1 shows a schematic depiction of a typical prior-art neuron along with some of its primary components. Individual neurons such as the one depicted in FIG. 1 produce very fundamental functionality, which can be combined with other neurons to produce neural networks which are capable of functioning as adaptive systems for many applications such as classification, pattern matching, and robotics. The components making up the neuron are the neuron's body 100, an output axon 102, input synapses or connections 104, 104b, . . . 104n containing weights 106, 106b, . . . 106n holding weight values (labeled W1 through Wn in FIG. 1). The neuron body calculates an output value (X), which is conveyed on its axon 102. The neuron's axon can be used to connect the neuron's output value to other neurons in a neural network, or to external processes. Other axons can be connected to the neuron's input synapses 104, 104b, . . . 104n. Axons 108, 108b, . . . 108n connected to the neuron via its synapses can originate at other neurons in a neural network, or from external processes. In some neural networks, they can even be fed back from their own neuron's output, or output axon 102. In many of the most popular neural network configurations, however, no feedback is used. There is no typical number of inputs to a neuron. Depending upon the application, a neuron may have as few as one input, or a few, thousands, millions, or even more inputs.
Each weight value 106, 106b, . . . 106n is depicted as a box at the synapse between the incoming axon and the neuron body.
Modulating Input Value By Weight Value—FIG. 1
In the signal propagation phase, when processing and propagating input signals, signal values are supplied to the neuron's synapses 104, 104b, . . . 104n. Each is modulated by the synapse's respective weight values 106, 106b, . . . 106n. The synapse modulating means is depicted in the diagram as a box with an asterisk for each input 105, 105b, . . . 105n. The effect of this modulation process is to pass, or gate, a portion of the input value through the synapse, which is proportional to the value of the weight. In this way, the weight value modulates the connection strength of the synapse. The result is then summed with the other similarly processed input values 110. Using conventional floating-point math, the modulating function is performed by multiplying the signal value by the weight value. It is expressed simply as:ri=AiWi 
In this formula, for each of the neuron's synapses ‘i’, ‘ri’ is the result of that synapse's modulating function, “Ai” is an axon level (AL), which is the value carried by the axon that is connected to the synapse, and “Wi” is the weight value for modulating the input signal at the synapse. In typical neural network configurations the weight may be negative or positive. A negative weight value produces a negative result (ri), which will reduce the sum, thereby acting as an inhibitory synapse. A positive weight value will produce a positive result (ri), which will contribute to an increase in the sum of all the results, thereby acting as an excitatory synapse.
Weights are usually adjusted in a separately performed training procedure to bring the outputs produced in the signal propagation phase closer to a desired response for a given set of input signals. During the learning phase, the neuron is trained using an external learning algorithm 114. A set of input patterns is presented to the neuron or neural network being trained. For neurons at the output of the network, the external learning algorithm produces an error value 116 by comparing the neuron's output to predetermined set of desired outputs (responses) for the pattern. The error value represents the difference between the desired output and the output produced by the neuron. The error value is used to train the weights within the neuron so that the next time the same input pattern is presented to it, the response will be a little closer to the desired response.
Output Functions—FIG. 1
Functionally, a typical neuron can be described at a high level of abstraction as a device that accepts multiple input values and processes them to produce a single representative output value. Generally, the output value produced is the sum of all the neuron's inputs, after they have each been multiplied by their respective synapse's weight values. The neuron's output signal value is then made available for connection to other neurons or processes through its output axon (also referred to as its output).
The value carried on axon 102 is sometimes referred to here as an axon level (AL). As mentioned, the single output value produced by a neuron is a weighted sum representation of the values that are connected to the neuron's input synapses 104, 104b . . . 104n through other axons 108, 108b . . . 108n. As the values connected to its inputs change, so will the neuron's single representative output value (denoted with an X in FIG. 1).
At a more practical level, the internally produced sum of a neuron's multiple weighted inputs 110 will often be restricted to a predetermined range before being output on its axon. Typically, axon levels will be restricted to positive values between 0.0 and 1.0. Floating-point arithmetic is typically used, though other representations, such as percentages, or integer representations are also acceptable. The process that restricts the internal sum of a neuron's weighted inputs is often referred to as a squashing function 112. It is used to restrict the values produced by neurons to a reasonable range. The neuron's output value X (its axon level) can be connected to other neurons where it may then be summed together with other axon levels. These sums can become infinitely large if left to propagate unchecked. It is essential therefore, that the level at the output of each neuron be restricted, limited, or clipped in some way so that it remains in a workable range.
Selecting Transfer Function
There are a variety of squashing functions that can be used to limit the neuron's output level. Simply clipping the weighted sum of the input values to maximum and minimum values, for example, a range of 0.0 to 1.0, is one of the simplest methods. Here, any sums of weighted inputs that exceed 1.0 will be made 1.0, and any sums of weighted inputs that fall below 0.0 will be made 0.0 on the neuron's axon.
This simple clipping technique will work well as long as the levels produced by summing the weighted inputs typically stay below the level where they will be clipped. Once the internal sum exceeds the clipping maximum, differences in the input signals will not be reflected as differences in the neuron's output signal. That is, the output will be identical for all input values that cause the weighted sums to exceed the maximum axon level value. Since most weight-training algorithms assume and require that differences in inputs will be represented as differences at the neuron's resultant output value, this situation should be avoided.
Sigmoid Squashing Function—FIG. 1
A more general transfer function called a sigmoid or squashing function is often defined in the event the neuron is employed in a network with hidden layers. Such a transfer function is expressed as:
      f    ⁡          (      x      )        =      1          1      +              ⅇ                  -          x                    
Here x is the result of the above described modulation and summing facilities, and e is the natural exponentiation constant (2.718281828 . . . ). This will keep the output below 1.0 but will also bias weighted sums near zero to produce an output of approximately 0.5.
The output value returned by the transfer function is placed on the neuron's output axon 102 (FIG. 1). The output signal value X is an axon level, which is made available for connection to other neurons as well as to external processes on the neuron's axon, which is also referred to as its output means or simply its output.
Most prior-art neural network methods use a similar sigmoid squashing function as their transfer function. A sigmoid squashing function causes the output value to increase more slowly as its input approaches the maximum allowable level. As the maximum is approached, large increases in the internal sum will produce successively smaller increases in the resulting output value. Near the maximum side of the allowable range, this ensures that differences in input values will be represented as differences in output values on the neuron's axon (though the differences on the output will be much smaller). Its advantage over simpler schemes is that it provides at least a small amount of representative change in its output value as long as the variable has enough resolution to represent it.
The sigmoid squashing function also has benefits for internal sum values near the minimum AL value, which is usually 0.0. In this case, relatively large changes at lower values will produce smaller changes in the output. This may be a benefit in some prior-art designs. On the other hand, it may be desirable to have large changes at lower values to help effect early learning. For this reason, prior-art neurons may sometimes bias the sigmoid function in order to speed learning at lower output levels.
The sigmoid function is computationally intensive, so simpler schemes, such as approximations based on a table lookup are sometimes used. This is especially true in applications where the computational costs of the sigmoid function will tend to outweigh its benefits.
In some cases, where the designer is confident the neuron outputs will not become unreasonably large, the transfer function may be defined as simply passing the calculated internal sum to the neuron's output. In such a case, the transfer function would be expressed as:f(x)=x 
Here, the function ƒ(x) simply returns the value of it parameter x as its result. This basically makes the neuron's internally calculated weighted sum of inputs the output value of the neuron.
Two Main Phases or Modes of Operation
As discussed above, there are generally at least two main phases, or modes of functional operation for a neuron and for neural networks made up of neurons. These are a signal-propagation mode and a weight-adjustment mode. In the signal-propagation mode, input stimuli, sometimes called signals or axon levels, are supplied to the neuron, and are processed to produce the single output signal for the neuron. The output of each neuron processed in this way is then applied as input to post-synaptic neurons, which process their input in the same way. This mode of operation is sometimes referred to as the execution phase or run-time mode of a neural network. The other general operational mode of a neural network is the learning mode, which is sometimes called the weight-training, or weight-adjusting mode. Usually, a neural network is fully trained initially to perform some task, and is then placed into service running exclusively in its signal propagation mode so no further training commences.
Learning Algorithms
A neuron will map a pattern of input stimulus or signals to a desired set of output responses for a given set of input patterns. A neuron “learns” to respond correctly to a given pattern of input values by having its weight values adjusted, or trained, by a learning algorithm (114 in FIG. 1), which is sometimes called a weight-adjustment means, a weight-adjustment method, or weight training method or means. When a neuron or neural network is having its weights adjusted by a learning algorithm, it is said to be in learning mode, or weight-training mode. A learning algorithm is sometimes referred to as a weight training algorithm or just a training algorithm because it is the set of functional methods that are used to “train” weights in the neurons of the neural network.
Producing Neuron-Level Error Values In Output Layer Neurons
During this process, the weight values are adjusted higher or lower to bring the neuron's output value X closer to a desired output. For training, the desired output is predetermined for the specific pattern of values that are present on the neuron's input synapses. The first step is to produce an error term δ 116 for the neuron k, from which proportional weight changes at the neuron's connection synapses can be calculated. For a given neuron k that is directly connected to the output of the network, the error term is simply the difference between the output produced by the neuron Xactual, and the output we desire Xdesired. It is expressed as:δk=Xkdesired−Xkactual 
The error term δ 116 for a given neuron is then used to adjust each individual weight value in the neuron in an effort to move the neuron's output closer to its ideal value. How these error terms are applied to adjustments of individual weight values will be discussed in more detail below.
Neural Networks With Hidden Layers
The method described above of obtaining an error value for training a neuron in a neural network breaks down when there is no direct output connection. That is, neurons in a neural network that connect only to other neurons will contribute to the network's output, but in ways that are difficult to compute. Such neurons are called hidden neurons because their outputs are hidden “behind” other neurons. Because they are usually configured in networks that use no feedback, they are almost always part of an entire layer of neurons that are hidden. For this reason, related groups of hidden neurons are generally referred to as hidden layers, or hidden slabs.
Back Propagation
Networks that do not permit a neuron's output signal to feed back to any previous or upstream neurons feeding the instant neuron are called feed-forward networks. The distinction is made in the prior art primarily because a family of gradient descent learning algorithms have been developed for feed-forward networks, which propagate error values back to hidden neurons. These algorithms are called back propagation, or back-error propagation, learning algorithms. The feed-forward neural networks they run on are often classified as back-propagation neural networks. While there are other types of networks, back-propagation networks have experienced considerable success. They have been widely used and are generally well-known.
Back propagation uses a special set of calculations to produce error values (116 in FIG. 1) for hidden layer neurons. The expression for calculating the error value at a given neuron in a hidden layer j are dependent on the error values that have been calculated at the subsequent (post-synaptic) neurons k to which neuron j is connected, along with the weight value W between the two neurons. The calculation is expressed as:
      δ    j    =            [                        ∑          k                ⁢                                  ⁢                              δ            k                    ⁢                      W            jk                              ]        ⁢          (                        X          j                ⁡                  (                      1            -                          X              j                                )                    )      
Note that the output of the neuron for which the error value is being calculated Xj is used in the calculation as well. Here, it represents the result of the output transfer function, further modified to represent the derivative of the sigmoid.
Adjusting Individual Weights
The error value δ calculated for a given neuron j using one of the above formulas is then incorporated in making the individual weight-adjustment calculations for the neuron j. There are a variety of ways the calculated neuron error values are used to adjust the neuron's weights. One example is given by the equation:Wij=Wij+ηδjAi 
Here, i represents the pre-synaptic neuron or process that is connected to the neuron j whose weights are currently being adjusted. In this calculation Wij is the weight value between i and j to be adjusted by the calculation. The weight is adjusted by the neuron's error value δj which can be obtained using the back propagation method (for a hidden neuron) or simply subtracting the desired output from the actual output (for an output neuron). The neuron's error value is further modulated by the learning rate η determined as part of the learning algorithm. The learning rate is generally used to slow the rate at which the weights are altered so as to reduce the amount each weight value adjustment will corrupt weight values that have already been trained on previously trained patterns.
Finally, the individual weight-adjustment value is also proportional to the output value Ai produced by the pre-synaptic neuron or process connected to (and modulated by) this weight (i.e., the output value of the neuron connecting to the target neuron). If the value on the output of the pre-synaptic neuron is small, the weight-adjustment will be small in the target neuron. This particular weight-adjustment method, based on these three factors, is sometimes referred to as the GDR, or generalized delta rule. It is identical to a previous weight-adjustment calculation that was simply called the delta rule.
Neurons are Typically Used in Networks
As stated, a neuron is the primary component in a neural network. To perform a useful task, a typical neural network may be composed of a small handful, tens of thousands, or even millions of individual neurons connected together in a variety of ways and trained. Information is represented in a neural network according to the strengths of the connections between the individual neurons comprising the network. Connection strengths between neurons are represented by the weight values at each neuron's input synapses. Information is represented in these connection strengths between each neuron in a highly distributed way across all, or at least many, of the neurons and connections making up the neural network.
Typical Neural Network Structure—FIG. 2
FIG. 2 shows a simple feed-forward prior-art network structure of the kind typically used in neural networks employing the back-propagation learning algorithm. A typical example of an application for such a network would be a character recognition machine that accepts bitmaps of characters as inputs, and outputs distinct codes for each distinct character recognized. The network contains a set of input nodes 200a-200f, a hidden layer of neurons 201a-201c, and a set of output neurons 202a-202f. There may be more than one hidden layer, though in the network of FIG. 2 there is only one. Input layer nodes 200a-200f are traditionally represented as neurons in diagrams such as these, but they are usually limited to carrying only sets of input patterns into the network. The input node neurons are not normally trained and do not normally have inputs to sum. The neurons in the hidden and output layers, on the other hand, produce weighted sums of the signal values present on their inputs and have their weights adjusted during a network training phase.
The neural network is utilized by placing an input pattern on the input nodes and processing it at each of hidden layer neurons 201a-201c in signal propagation mode. This will produce an output for the pattern. The output of the hidden layer is in turn processed by the neurons in forward (post-synaptic) layers until an output pattern for the entire network is produced on the outputs of the output layer neurons 202a-202f. 
Network Training Phase
Once a given input pattern is presented and propagated through the network in this way, the training phase may begin. To train existing neural networks, sets of signals representing desired exemplary input patterns are usually successively applied to the primary inputs and allowed to propagate through the neural network (as discussed) to the output. This has been referred to here as the signal propagation, or execution phase. The differences between the actual and desired output values, determined by an external teacher, are then calculated to arrive at an error signal for each output neuron. The calculated error is then used to adjust each output neuron's synapse weights. Error values used by the hidden layer neurons are calculated using the back propagation formula that depends on the error values having been calculated for all the neurons to which the target neuron is connected. Just as in the output neurons, the calculated error value is then used as a factor in the calculation used to adjust each individual weight.
The process of presenting exemplary patterns and training toward a desired output is performed in a recurring manner and typically requires a large number of iterations through all the patterns to reduce errors appearing at the primary network outputs to an acceptable level.
Adjustments need to be made slowly because, as input patterns are trained, every weight-adjustment will adversely affect the weight value adjustments performed previously for all other patterns. This is primarily the purpose of the learning rate used in the weight training calculations discussed above.
Weights Encode Mappings Between Inputs and Desired Responses
Any given weight in a neuron or in a neural network can, and likely will, contribute to a multitude of different trained responses to different input combinations. This characteristic of neural networks is both a strength and a weakness. It is a strength because it allows the neural network to generalize and apply lessons previously learned when responding to a novel set of similar inputs. That is, if the network has learned a desired output for one set of inputs, and it is then presented with a set of inputs that are almost, but not quite, identical, it will produce an output that is conceptually similar to the output it learned in response to the first training set. This same generalization behavior may also be a weakness in situations where very different responses are required for two similar input patterns. In this case, a neural network will have trouble discerning between the two similar input patterns. It will want to generalize the response learned for one pattern to produce a similar response to the other pattern.
A neural network's use of the same set of weight values to encode multiple responses doesn't necessarily eliminate its ability to discern two very similar input vectors that require very different responses. It does make such discernment difficult though, requiring more and more resolution, or bit-width from the variables used to hold weight values. In this case the small number of inputs that differ between the two sets of inputs will be responsible for all the difference in the output values. In other words, the weights for the inputs that aren't common to both sets of inputs will be adjusted deeply to compensate for the values produced by the weight calculations performed on the common inputs. From this it can be seen that the ability of a neural network to discern between similar input sets is directly related to the resolution, or bit-width of the weight values.
Those who design and train the network may not specify or know how the neural network ends up representing the various responses to input patterns within its connection weights between neurons. In many instances, only representations of the trained input patterns to the network, and the desired output from the network for each of those input patterns, are known and presented by the trainer. How the network produces those desired responses, and how that information is represented internally by the connections within the neural network, is a product of many factors. Such factors include the initial neural network structure, the informational structure of the training data, the initial weight values, and the training algorithm and learning rate used. Other factors that may affect the ultimate representation in weight-values include small random changes made to weights as the network is trained, the order that the training set is presented to the network, and any imperfections in the training set that may be presented during training
Back Propagation Networks Do Not Normally Permit Signal Feedback
It can be seen in the above calculation of the neuron level error values for a hidden layer neuron that the same error value is required to have been calculated for each post-synaptic (forward) neuron δj that the current neuron is connected to, prior to producing the current neuron's error value. This is what restricts back propagation to feed-forward-only networks.
Studies of biological neural networks have demonstrated that numerous and diverse signal feedback paths and mechanisms exist in biological organisms (Levitan, Kaczmarek, “The Neuron,Cell And Molecular Biology”, 2002, Oxford University Press, ISBN: 0-19-514523-2). These mechanisms include direct signaling carried on afferent axons (those that bring signals from output neurons to input neurons) back through networks whose outputs, in turn, affect the efferent signal flows. Signal feedback in biological organisms also occurs through a variety of chemical signaling mechanisms carried directly through glial cell (those cells that support neurons in biological brains) and through the bloodstream from a variety of intra-organism sources. Finally, signal feedback mechanisms occur tacitly in biological organisms through the senses that carry afferent information about causal effects that efferent signals have had on outside world environments.
In biological organisms, the concept of internal and external may not represent absolute locations, but instead may allude to a continuum. Simplistically, afferent signaling may begin with senses of external world events caused by motor controls such as muscles. On the other hand, it may be caused by a chain of events that started within the brain and ended up as stimulation to the adrenal gland, which in turn produces an afferent chemical signal that causes the brain to retain more experiences. The later case shows that external signals can be generated by a feedback loop that never leaves the organism. Such loops may occur entirely inside the brain, or may just get out to the point of generating scent signals via the organism's own sweat glands, which are then sensed and have an afferent effect on the network. Much further out, a very complex chain of external events may be affected by the brain and produce effects that are then sensed by the brain. In this way, the brain can produce effects on, and correct for, external world events. To summarize, feedback loops of signals originating in the brain and returning can remain inside the brain, go outside the brain but remain inside the organism, or include causal activities completely outside of the organism.
Back Propagation is Difficult to Use With Other Types of Learning, Such as Positive and Negative Reinforcement
Back propagation is an abstraction that encapsulates the detailed mechanisms used in biological networks to determine how connection strengths are altered by experience to better respond to the external environment. Because the back propagation algorithm's abstraction of these underlying mechanisms encapsulates the details of these mechanisms into its reverse error propagation calculation, it doesn't normally allow those same mechanisms to be broken out and used in combination. In other words, the details of how biological neural networks actually produce changes in their connection strengths are completely incorporated into a single conceptual black-box that is the back propagation algorithm.
The advantage of this level of abstraction is that back propagation fully encompasses and mimics the concepts of positive and negative reinforcement learning within its calculations, thus freeing the network designer from having to consider such details when designing a neural network. The disadvantage can, in some sense, be expressed by simply parroting the advantage. The designer must accept back propagation's abstract interpretation of the underlying concepts and details that it encapsulates. There is little (if any) flexibility in the details of how such mechanisms can be employed by the neural network to affect changes in its connection weights.
Back Propagation Networks can be Retrofitted to be Less Susceptible to Feedback Restrictions
A variety of methods to work around back-propagation's inability to be used for neural networks employing feedback (e.g., Recurrent Neural Networks) have been tried with varying levels of success.
Back Propagation Through Time (BPTT) has been Used to Partially Mitigate Back Propagation's Limitation on Feedback
A simple means of applying standard back propagation in a neural network that would normally employ feedback (a recurrent neural network) is called “Back Propagation Through Time” or BPTT. It accommodates the use of the standard back propagation algorithms by unfolding the time sequence that would normally be present in a recurrent network in order to get around back-propagation's restriction on feedback. The essence of BPTT is that it unfolds the discrete-time recurrent neural network into a multilayer feed-forward neural network (FFNN) each time a sequence is processed. In effect, the FFNN has a separate hidden layer for each “time step” in the sequence. It should be noted that the feed-forward-only restriction of back propagation has not been overcome in BPTT. Instead a clever means of removing the feedback from a recurrent network has been implemented so that the back propagation learning algorithm can be used along with its inherent restriction.
Real Time Back Propagation (RTBP) Permits Feedback but Only of Output Neurons
Real time back propagation, or RTBP, permits feedback from output layer neurons to output layer neurons only. As shown in the above calculations, each of the output neurons' error values are calculated directly by subtracting an expected (desired) output from the actual output of each neuron. Since there is no need to use the calculated error values from post-synaptic neurons in order to calculate these error values, this tightly restricted form of feedback for this one layer can be permitted in RTBP networks. There is still no way to produce the feedback of complex afferent signal flows seen in biological neural networks. Any feedback from the outputs of hidden neurons to themselves, or to neurons even further back would break the learning algorithm. This is because, for all but the output neurons, the back propagation learning algorithm requires that error values be calculated for all post-synaptic neurons before the error value for the current neuron can be calculated.
Alternatives to Back Propagation that Allow Feedback Will Usually Strictly Dictate Specific Network and Feedback Structures
Alternatives to back propagation neural networks have been conceived in an effort to produce networks with some rudimentary forms of feedback (Douglas Eck, University Of Montreal, Recurrent Neural Networks—A Brief Overview, 1 Oct. 2007). These generally rely on very strictly prescribed feedback topologies and complex feedback functions to achieve networks with recurrent characteristics. While the recurrent character of the brain is abstractly emulated in these schemes, the ability to design neural networks with the complex and varied feedback structures observed in connected networks of biological neurons is poorly addressed or not addressed. Other forms of networks that permit strictly defined feedback will limit the network to a single layer, or to no more than two layers.