1. Field of the Invention
The present invention relates, in general to adaptive critic systems, and, more particularly, to adaptive critic systems in which a change in situation value is used as a network input.
2. Relevant Background
Artificial neural networks (ANNs) are known, and are described in general in U.S. Pat. No. 4,912,654 issued Mar. 27, 1990 to Wood (Neural networks learning method) and in U.S. Pat. No. 5,222,194 issued Jun. 22, 1993 to Nishimura (Neural Network With Modification Of Neuron Weights And Reaction Coefficient), both of which are incorporated herein by reference.
ANNs are systems used to learn mappings from input vectors, X, to output vectors, Y. In a static and limited environment, a developer provides a training setxe2x80x94a databasexe2x80x94that consists of a representative set of cases with sensor inputs (X) and corresponding desired outputs (Y), such that the network can be trained to output the correct Y for each given input X, but is limited to the developer""s specification of correct outputs for each case, and therefore may not succeed in optimizing the outcomes to general users.
In the more general case, it is valuable or essential for the system to learn to output so as to optimize the expected value of a mathematical xe2x80x9cPrimary Value Functionxe2x80x9d, usually a net present expected value of some function over time. It may also be essential to learn a sequence of actions to optimize the function, rather than being restricted to a single optimal output at each moment (e.g., a robot may have to move away from a nearby object having a local maximum value, in order to acquire an object having a larger, or global, maximum value). The preferred class of techniques meeting these requirements is adaptive critics, described in Miller, Sutton, and Werbos, Eds., Neural networks for control. Cambridge, Mass.: MIT Press (1990), and in Barto, A., Reinforcement learning and adaptive critic methods. In D. A. White and D. Sofge (Eds.), Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches. Van Nostrand (1992).
Connecting actual or simulated sensors and actual or simulated actuators to the inputs and outputs, respectively, of adaptive critics and related systems, make complete adaptive autonomous agents. These agents are a focus of some researchers in robotics sometimes called xe2x80x9cbehavior-oriented artificial intelligencexe2x80x9d as described in U.S. Pat. No. 5,124,918.
The advantages of these systems are that they are readily adapted to acting in real environments. With adaptive critics and related techniques, a training set may either be constructed by the developer, or collected from actual historical data, or created by putting the system into contact with the actual application environment.
While ANN techniques have several major advantages (they learn rather than requiring programming, they can accept many forms of inputs, and certain designs can perform mathematical optimization of a value function) there are classes of problems that have been considered difficult or impossible to solve. Examples of problems whose solution using a linear neural network is not obvious include the matching-to-sample (MTS) problem and delayed MTS problem. In these problems the agent is required to respond so as to indicate whether two stimuli match (or correspond in some other way) to each other or not. The stimuli can be of any kind, and may be separated in space or time. This problem is a general version related to the more specific problem of implementation of an exclusive-or (XOR) function using neural networks. The XOR problem is a prototypical case of a nonlinearly separable problem discussed at length by M. Minsky and S. Papert, in Perceptrons, Cambridge, Mass. (MIT Press, 1969). The Perceptrons reference argues that hidden nodes are required to solve the XOR problem. It is desirable to find an architecture and method for operating an ANN that can solve or simplify the solution to this class of problem as an alternative or adjunct to using networks with hidden nodes.
Briefly stated, the current invention involves creating a sensor input to an artificial neural network, where the input""s activation is determined by the change in situation value. Alternatively, the activation may be determined by a preselected function of the change in situation value. As a result of this connection, the agent is responsive to prior increases and decreases in situation value. This connection enables or enhance an agent""s ability to solve various problems.
In a particular implementation, an adaptive agent includes an artificial neural network having a plurality of input nodes for receiving input signals and a plurality of output nodes generating responses. A situation value unit receives input from a plurality of the responses and generates a situation value signal. A change sensor coupled to receive the situation value signal generates an output signal representing a change of the situation value signal from a prior time to a current time. A connection coupling the change sensor output to one of the input nodes of the artificial neural network.