The human brain is made up of a group of neurons connected to one another in a network. When we learn something, new connections are made or existing connections are modified between the neurons.
Neural networks are massively parallel computing models of the human brain, consisting of many simple processors connected by adaptive weights. Neural networks are being used in artificial intelligence-related applications. The parallel neural networks are often simulated on serial computers, with the processors simulated by program code, and the connections modeled by data.
Most present day neural network models, such as the Back propagation model, are supervised neural network models. Supervised neural networks models differ from conventional programs in that a programmer does not write algorithmic code to tell them how to process data. Instead, a user `trains` a neural network by presenting training data of the desired input/output relationships.
Other neural, network models are unsupervised neural network models. Unsupervised neural networks can extract statistically significant features from input data. This differs from supervised neural networks in that only input data is presented to the network during training.
Regardless of whether a neural network model is supervised or unsupervised, most models today learn by receiving immediate reinforcement each time they perform a function. While this immediate reinforcement approach works well with many types of basic neural network applications, it does not work well for more complex neural network applications, where it is not possible or meaningful to give immediate reinforcement each time a function is performed. For example, if it is desired to teach a robot to find a "goal state", such as a tool bin, the robot doesn't know if it is heading in the right direction until it actually finds the tool bin. Attempts to provide immediate, reinforcement after each movement have proven to be ineffective.
Publications by Barto et al have attempted to solve this problem by introducing the concept of delayed reinforcement. In this approach, the neural network controlling the robot would not get any reinforcement until it found the parts bin. The Barto et al approach uses a temporal difference method which uses the concept of gradient descent to adjust connectionist weights slowly based on the gradient.
While the Barto et al approach represents a significant advancement over other neural network models, it is not without its shortcomings. For example, any gradient descent approach is not guaranteed to find the best path to a goal state, since it can become "trapped" in a local minimum on an error surface. Therefore, the Barto et al neural network model can easily become tricked into thinking it has found the most optimal path when in fact it has not. This is a serious shortcoming, since merely finding a "good" path is nowhere near as desirable as always finding the "best" path.
An analogy will help explain the Barto et al approach. Barto's neural network model is like riding a bicycle on a very hilly road--where the bicyclist's objective is to find the deepest valley on the road. When a bicyclist is in a valley, he may think he is in the deepest valley if he cannot see any valleys deeper than the one he is in, and be content to stay in that valley. Unknown to the bicyclist, however, is that there is a much deeper valley two hills away that he cannot see.
Another shortcoming of a neural network model using the Barto et al approach is that it is not given immediate feedback that it is on an unproductive path. If our robot, in its quest for the parts bin, finds itself visiting the same location more than once, it has wasted valuable time inefficiently going around in circles. Barto et al's neural network model does nothing to stop this unproductive path unless a predetermined maximum number of steps are exceeded. Therefore, the robot can visit many locations more than once before the neural network model finally determines that it is lost and needs to start over. This approach is inefficient and a waste of valuable time.
Barto et al and the prior art approaches have failed to adequately address the above problems. The prior art has largely been confined to theoretical and experimental applications that are unsuitable for commercial environments.