1. Field of the Invention
The present invention relates in general to artificial intelligence systems and in particular to a new and useful system which builds upon artificial neural network designs and learning techniques with further processes to achieve verbal functions.
2. Relevant Background
Artificial neural networks (ANNs) are well known, and are described in general in U.S. Pat. No. 4,912,654 issued Mar. 27, 1990 to Wood (Neural networks learning method) and in U.S. Pat. No. 5,222,194 issued Jun. 22, 1993 to Nishimura (Neural network with modification of neuron weights and reaction coefficient), both of which are incorporated herein by reference.
ANNs are systems used to learn mappings from input vectors, X, to output vectors, Y. In a static and limited environment, a developer provides a training setxe2x80x94a databasexe2x80x94that consists of a representative set of cases with sensor inputs (X) and corresponding desired outputs (Y), such that the network can be trained to output the correct Y for each given input X, but is limited to the developer""s specification of correct outputs for each case, and therefore may not succeed in optimizing the outcomes to general users.
In the more general case, it is valuable or essential for the system to learn to generate outputs so as to optimize the expected value of a mathematical xe2x80x9cPrimary Value Functionxe2x80x9d, usually a net present expected value of some function over time. It may also be essential to learn a sequence of actions to optimize the function, rather than being restricted to a single optimal output at each moment (e.g., a robot may have to move away from a nearby object having a local maximum value, in order to acquire an object having a larger, or global, maximum value). The preferred class of techniques meeting these requirements is adaptive critics, described in Miller, Sutton, and Werbos, Eds., Neural networks for control. Cambridge, Mass.: MIT Press (1990), and in Barto, A., Reinforcement learning and adaptive critic methods. In D. A. White and D. Sofge (Eds.), Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches. Van Nostrand (1992).
Connecting actual or simulated sensors and actual or simulated actuators to the inputs and outputs, respectively, of adaptive critics and related systems, make complete adaptive autonomous agents. These agents are a focus of some researchers in robotics sometimes called xe2x80x9cbehavior-oriented artificial intelligencexe2x80x9d as described in U.S. Pat. No. 5,124,918 and in Brooks, 1990, and Maes, 1993-4.
The advantages of these systems are that they are by definition capable of acting in real environments. With adaptive critics and related techniques, a training set may either be constructed by the developed, or collected from actual historical data, or created by putting the system into contact with the actual application environment.
While ANN techniques have several major advantages (they learn rather than requiring programming, they can accept many forms of inputs, and certain designs can perform mathematical optimization of a value function) they can only learn from direct experience and not from verbal/symbolic/codified knowledge which comprises the large majority of available human knowledge.
Although ANNs have been used for manipulation of language, they have not been used for functional interaction with objects. See, for example (Davis, 1992); Rumelhart and McClelland (1986) (ANN taught to output the past tense of verbs when given the present tense form); Elman (1992) (ANN taught to predict the next word in a sentence). The majority of research attempts to assign a grammatical role for each word in sentences. In this research, the values used in the training signals are provided by the trainer rather than being derivable from the functional value contributed by the verbal responses.
On the other hand, expert systems incorporate verbal knowledge, especially condition-action pairs or rules. However, the knowledge in most potential application domains for intelligent systems cannot be represented adequately by such rules. Moreover, traditional expert systems have no capability to learn from experience to improve performance. A further disadvantage of expert systems is the effort required to formulate the necessary rules. The overall architecture designs require so much processing that they have been far to slow to control realistic sensorimotor systems for robotics.
To reduce the burden of formulating the rules for expert systems, an approach typically called machine learning was developed. This approach consists basically of logical inference from data to produce rules. This is a very restricted form of learning as compared with the more general and powerful methods of ANNs.
While the potential value of combining the learning, representation, and optimization of ANNs with verbal capabilities such as those of expert systems and fuzzy logic is clear, prior attempts have achieved only very limited functionality.
Hybrid designs contain both expert system and ANN subsystems, so they are inherently complex, and have achieved only very limited results. See, for example, Caudill, M. (1991) Expert networks. Byte, 16(10), 108-116.
The present invention draws from theoretical analyses regarding the problems of functional language usage outlined in Verbal Behavior, by B. F. Skinner in 1957. The key assumption of Skinner""s xe2x80x9cradical behavioristxe2x80x9d theory is that verbal behavior is not fundamentally different from nonverbal behavior. Linguistics theorists in general and connectionist language researchers in particular have been aware of Skinner""s theory since its publication, but have consistently vehemently rejected it as being erroneous or not applicable (Chomsky, 1959; Harris, 1993; Pinker, 1995). The main criticisms are that the theory supposedly could not produce the very rapid learning of language which is seen with humans, that it could not account for the production of novel sequence of speech, and in general that the xe2x80x9csimplexe2x80x9d concepts of operant conditioning could not account for the enormous complexity of language. The authors of the seminal volumes on neural networks, including language research, (McClelland, Rumelhart, et al., 1986) explicitly reject the behavioral paradigm: xe2x80x9cIn this sense, our models must be seen as completely antithetical to the radical behaviorist program.xe2x80x9d (p. 121).
Certain ANN architectures, such as higher-order networks, have the potential to permit rules to be programmed directly into networks. See, for example, Hutchison, W. R. and Stephens, K. R., Integration of distributed and symbolic knowledge representations, Proceedings of the first international conference on neural networks, 2, 395-398, IEEE Press. This can be accomplished by connecting the condition part of the rule (as inputs) to the action part of the rule (as outputs). Most ANN architectures and algorithms are not compatible with such an approach.
The most common technique for training ANNs to follow rules has been to construct training sets whose mastery requires following the rules. The ANN may be allowed to make errors or it may be artificially forced to make the correct response (Lin, 1991; Whitehead, 1991). As with direct programming, the resulting system complies, but does not explicitly follow, the rules. There are a number of major disadvantages to training compliance by examples:
a. Constructing the set of training examples is usually a significant additional effort beyond formulating the rule; it must be done for every rule.
b. It may be difficult or impossible to create a training set that contains the desired relationships while avoiding irrelevant relations.
c. It is especially difficultxe2x80x94even impossible in some networksxe2x80x94to train correct behavior where certain actions are almost always rewarded (e.g., crossing railroad tracks, investing in real estate in previously solid markets), but on rare occasions have catastrophic results.
d. Many relations are so remote in time or space, or so weak in probability that they will never be learned by direct experience of an individual (e.g., avoiding chemicals that cause cancer years later). If they are taught by overrepresenting them in the sample, the learning will be inappropriate for optimization.
In both direct programming and training set techniques, the system complies with the given rules, but does not learn the rule as a verbal statement. Lack of explicit verbal content imposes a number of major disadvantages on such systems.
A xe2x80x9crule-compliantxe2x80x9d network cannot adequately state what it knows. In certain types of networks the structure can be decoded, but a listing of the associations generally contains a large number of irrelevant relations. Another approach (Gallant, 1988, 1993) is to determine partial derivatives by testing the impact of manipulating an input on an output, but this is not practical for complex relations which are typical of real world problems. Systems that cannot state their knowledge cannot:
i. Explain or justify their actions.
ii. Teach another person or system.
iii. Learn from discussing their knowledge with other agents (human or machine).
This weakness is very serious in any case, but especially in view of the rapidly developing communications network in which computers are connected, where the ability to converse verbally with other agents opens up a vast potential not otherwise available.
An important process in human problem solving uses verbal behavior to transform a novel problem into a new problem or subproblems for which solutions are known (Donahoe and Palmer, 1994). For example, if the answer to the problem xe2x80x9c23 times 117xe2x80x9d is not immediately known, we xe2x80x9cbreak downxe2x80x9d the problem into subproblems for which we have answers (e.g., 3 times 7). Networks without explicit verbal behavior cannot do such problem solving. Even more demanding is xe2x80x9ccreative problem solvingxe2x80x9d where we may have to perform several tentative xe2x80x9cverbal transformationsxe2x80x9d before even recognizing how to proceed.
Current neural network methods are handicapped by their lack of verbal behavior, because the network is required to learn a complex task all at once rather than decomposing it. For example, Minsky and Papert (1969) asserted that linear nets cannot learn the exclusive OR problem. On the contrary, the Applicant has trained a linear network to perform this task perfectly, using verbal behavior in the same manner as many humans actually solve it. First the agent learns the xe2x80x9cORxe2x80x9d problem more typical in the real world: when presented with the two input stimuli, the agent responds to any positive stimulus with a positive output on the main output. Then the agent is taught an additional verbal response: If both stimuli are positive, the agent emits, in addition to the positive main response, a response which functions like saying xe2x80x9cbothxe2x80x9d. After saying xe2x80x9cbothxe2x80x9d, in the next network cycle that verbal response is available as an additional input to itself, which suppresses the system""s positive response and strengthens a negative response. In general terms, the verbal capability of the system enables it to reduce the effective dimensionality of the problem. Networks that can be taught these verbal responses can learn to solve many problems much faster.
As described above, networks can be taught or programmed to comply with rules, which is only one simple kind of input-output. However, such methods do not work for any other of the myriad kinds of relations in the world, such as: above, in, of, sister of, inside, subclass of, threatens, suggests, is the capital of, etc. ANN language research and knowledge-based systems that accommodate such relations have to explicitly program their processing: they cannot learn new relations from experience as can humans. This is a huge weakness.
Beyond being able to learn many kinds of relations is the challenge of deriving some value from the knowledge. Except for the trivial case of being able to repeat a relational statement, learning it will not be useful unless the agent has also learned how to combine the statement with other relational statements, and ultimately to actions. An agent must explicitly learn how to combine X greater than Y and Y=Z to conclude that X greater than Z; and that X greater than Y and Y less than Z does not lead to any conclusion about the relation of X and Y. This essential learning has also not been done with neural networks.
Jameson (1993) has proved that certain kinds of problems cannot be solved without the use of models or representations of the world. Most neural network architectures have no model component and therefore cannot solve such problems. Those that do (e.g., White and Sofge, 1992) require that the model be specified to a significant (and often impossible) degree by the system developer. Verbal behavior permits a system to construct such models.
Obviously, some sources of information are more reliable than others, such that information should be differentially learned, and thereafter differentially relied upon. ANNs are programmed or trained to comply with all advice, or if differential strengths are used, they must be given by the developer rather than learned. If a new statement were then given from a known source, the system should be able to generalize regarding the reliability of the statement from the reliability of previous statements from that source; but existing methods would not handle that case. This capability should go beyond considering the source: Take Einstein""s advice about physics but not about economics.
Apart from the differential reliability of statements, they have different degrees of value. It may be perfectly reliable that there are 743 cats in Chanute, Kans., but the value of this knowledge is so low that an agent should not waste resources learning it.
Briefly stated the invention involves an autonomous adaptive agent which can learn verbal as well as nonverbal behavior. The primary object of the system is to optimize a primary value function over time through continuously learning how to behave in an environment (which may be physical or electronic). Inputs may include verbal advice or information from sources of varying reliability as well as direct or preprocessed environmental inputs. Desired agent behavior may include motor actions as well as verbal behavior. In addition to being a possible system output, verbal behavior may function xe2x80x9cinternallyxe2x80x9d to guide external actions. A principal novelty of the invention is an efficient xe2x80x9ctrainingxe2x80x9d process by which the agent can be taught to utilize verbal advice and information along with environmental inputs. A further object of the system is to restate verbal statements it has learned when prompted. A further object of the system is to solve novel problems.
Advantages of the system in accordance with the present invention over prior art include:
1. The system can learn to use verbal advice and other verbal information without the need for constructing sets of training examples. This ability saves the developer a large amount of work and increases the likelihood of achieving desired results.
2. The system can learn to perform correct behavior even where certain actions are almost always rewarded, but on rare occasions have catastrophic results.
3. The system can learn relations that are so remote in time or space, or so weak in probability that they will never be learned by direct experience of an individual.
4. Unlike ANNs trained by examples, the system can meet the requirement of many applications to learn a constant series of new verbal inputs and use them immediately to perform dictated tasks correctly the first time.
5. The system can overcome the inherent tendency of most adaptive systems (including humans) to be drawn to smaller immediate consequences over larger delayed consequences.
6. The system combines talking and listening in the same device, rather than requiring separate language understanding and production systems.
7. The system can use verbal behavior to transform a novel problem into a new problem or subproblems for which solutions are known.
8. The system can automatically learn to learn and depend more on information from reliable sources of information, or even more specifically to discriminate by domain. Apart from the differential reliability of statements, it can differentially learn statements which have more value for action. Relative value can also be the basis for resolving conflicts between rules of differing importance.
9. The system can repeat the verbal knowledge it has learned.
10. When the Primary Value states are connected as inputs to the system, the system can learn to adjust its behavior continuously as a function of its current goals/needs/state so as to optimize its Primary Value Function over time, while also incorporating information about environmental opportunities and spatiotemporal distribution of Primary Values.