1. Field of the Invention
The present invention relates in general to artificial intelligence systems and in particular to a new and useful system which builds upon artificial neural network designs and learning techniques with further processes to achieve verbal functions.
2. Relevant Background
Artificial neural networks (ANNs) are well known, and are described in general in U.S. Pat. No. 4,912,654 issued Mar. 27, 1990 to Wood (Neural networks learning method) and in U.S. Pat. No. 5,222,194 issued Jun. 22, 1993 to Nishimura (Neural network with modification of neuron weights and reaction coefficient), both of which are incorporated herein by reference. ANNs are systems used to learn mappings from input vectors, X, to output vectors, Y. In a static and limited environment, a developer provides a training set—a database—that consists of a representative set of cases with sensor inputs (X) and corresponding desired outputs (Y), such that the network can be trained to output the correct Y for each given input X, but is limited to the developer's specification of correct outputs for each case, and therefore may not succeed in optimizing the outcomes to general users.
In the more general case, it is valuable or essential for the system to learn to generate outputs so as to optimize the expected value of a mathematical “Primary Value Function”, usually a net present expected value of some function over time. It may also be essential to learn a sequence of actions to optimize the function, rather than being restricted to a single optimal output at each moment (e.g., a robot may have to move away from a nearby object having a local maximum value, in order to acquire an object having a larger, or global, maximum value). The preferred class of techniques meeting these requirements is adaptive critics, described in Miller, Sutton, and Werbos, Eds., Neural networks for control. Cambridge, Mass.: MIT Press (1990), and in Barto, A., Reinforcement learning and adaptive critic methods. In D. A. White & D. Sofge (Eds.), Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches. Van Nostrand (1992).
Connecting actual or simulated sensors and actual or simulated actuators to the inputs and outputs, respectively, of adaptive critics and related systems, make complete adaptive autonomous agents. These agents are a focus of some researchers in robotics sometimes called “behavior-oriented artificial intelligence” as described in U.S. Pat. No. 5,124,918 and in Brooks, 1990, and Maes, 1993-4.
The advantages of these systems are that they are by definition capable of acting in real environments. With adaptive critics and related techniques, a training set may either be constructed by the developed, or collected from actual historical data, or created by putting the system into contact with the actual application environment.
While ANN techniques have several major advantages (they learn rather than requiring programming, they can accept many forms of inputs, and certain designs can perform mathematical optimization of a value function) they can only learn from direct experience and not from verbal/symbolic/codified knowledge which comprises the large majority of available human knowledge.
Although ANNs have been used for manipulation of language, they have not been used for functional interaction with objects. See, for example (Davis, 1992); Rumelhart and McClelland (1986) (ANN taught to output the past tense of verbs when given the present tense form); Elman (1992) (ANN taught to predict the next word in a sentence). The majority of research attempts to assign a grammatical role for each word in sentences. In this research, the values used in the training signals are provided by the trainer rather than being derivable from the functional value contributed by the verbal responses.
On the other hand, expert systems incorporate verbal knowledge, especially condition-action pairs or rules. However, the knowledge in most potential application domains for intelligent systems cannot be represented adequately by such rules. Moreover, traditional expert systems have no capability to learn from experience to improve performance. A further disadvantage of expert systems is the effort required to formulate the necessary rules. The overall architecture designs require so much processing that they have been far to slow to control realistic sensorimotor systems for robotics.
To reduce the burden of formulating the rules for expert systems, an approach typically called machine learning was developed. This approach consists basically of logical inference from data to produce rules. This is a very restricted form of learning as compared with the more general and powerful methods of ANNs.
While the potential value of combining the learning, representation, and optimization of ANNs with verbal capabilities such as those of expert systems and fuzzy logic is clear, prior attempts have achieved only very limited functionality.
Hybrid designs contain both expert system and ANN subsystems, so they are inherently complex, and have achieved only very limited results. See, for example, Caudill, M. (1991) Expert networks. Byte, 16(10), 108-116.
The present invention draws from theoretical analyses regarding the problems of functional language usage outlined in Verbal Behavior, by B. F. Skinner in 1957. The key assumption of Skinner's “radical behaviorist” theory is that verbal behavior is not fundamentally different from nonverbal behavior. Linguistics theorists in general and connectionist language researchers in particular have been aware of Skinner's theory since its publication, but have consistently vehemently rejected it as being erroneous or not applicable (Chomsky, 1959; Harris, 1993; Pinker, 1995). The main criticisms are that the theory supposedly could not produce the very rapid learning of language which is seen with humans, that it could not account for the production of novel sequence of speech, and in general that the “simple” concepts of operant conditioning could not account for the enormous complexity of language. The authors of the seminal volumes on neural networks, including language research, (McClelland, Rumelhart, et al., 1986) explicitly reject the behavioral paradigm: “In this sense, our models must be seen as completely antithetical to the radical behaviorist program.” (p. 121).
Certain ANN architectures, such as higher-order networks, have the potential to permit rules to be programmed directly into networks. See, for example, Hutchison, W. R. & Stephens, K. R., Integration of distributed and symbolic knowledge representations, Proceedings of the first international conference on neural networks, 2, 395-398, IEEE Press. This can be accomplished by connecting the condition part of the rule (as inputs) to the action part of the rule (as outputs). Most ANN architectures and algorithms are not compatible with such an approach.
The most common technique for training ANNs to follow rules has been to construct training sets whose mastery requires following the rules. The ANN may be allowed to make errors or it may be artificially forced to make the correct response (Lin, 1991; Whitehead, 1991). As with direct programming, the resulting system complies, but does not explicitly follow, the rules. There are a number of major disadvantages to training compliance by examples:
a. Constructing the set of training examples is usually a significant additional effort beyond formulating the rule; it must be done for every rule.
b. It may be difficult or impossible to create a training set that contains the desired relationships while avoiding irrelevant relations.
c. It is especially difficult—even impossible in some networks—to train correct behavior where certain actions are almost always rewarded (e.g., crossing railroad tracks, investing in real estate in previously solid markets), but on rare occasions have catastrophic results.
d. Many relations are so remote in time or space, or so weak in probability that they will never be learned by direct experience of an individual (e.g., avoiding chemicals that cause cancer years later). If they are taught by overrepresenting them in the sample, the learning will be inappropriate for optimization.
In both direct programming and training set techniques, the system complies with the given rules, but does not learn the rule as a verbal statement. Lack of explicit verbal content imposes a number of major disadvantages on such systems.
A “rule-compliant” network cannot adequately state what it knows. In certain types of networks the structure can be decoded, but a listing of the associations generally contains a large number of irrelevant relations. Another approach (Gallant, 1988, 1993) is to determine partial derivatives by testing the impact of manipulating an input on an output, but this is not practical for complex relations which are typical of real world problems. Systems that cannot state their knowledge cannot:
i. Explain or justify their actions.
ii. Teach another person or system.
iii. Learn from discussing their knowledge with other agents (human or machine).
This weakness is very serious in any case, but especially in view of the rapidly developing communications network in which computers are connected, where the ability to converse verbally with other agents opens up a vast potential not otherwise available.
An important process in human problem solving uses verbal behavior to transform a novel problem into a new problem or subproblems for which solutions are known (Donahoe & Palmer, 1994). For example, if the answer to the problem “23 times 117” is not immediately known, we “break down” the problem into subproblems for which we have answers (e.g., 3 times 7). Networks without explicit verbal behavior cannot do such problem solving. Even more demanding is “creative problem solving” where we may have to perform several tentative “verbal transformations” before even recognizing how to proceed.
Current neural network methods are handicapped by their lack of verbal behavior, because the network is required to learn a complex task all at once rather than decomposing it. For example, Minsky and Papert (1969) asserted that linear nets cannot learn the exclusive OR problem. On the contrary, the Applicant has trained a linear network to perform this task perfectly, using verbal behavior in the same manner as many humans actually solve it. First the agent learns the “OR” problem more typical in the real world: when presented with the two input stimuli, the agent responds to any positive stimulus with a positive output on the main output. Then the agent is taught an additional verbal response: If both stimuli are positive, the agent emits, in addition to the positive main response, a response which functions like saying “both”. After saying “both”, in the next network cycle that verbal response is available as an additional input to itself, which suppresses the system's positive response and strengthens a negative response. In general terms, the verbal capability of the system enables it to reduce the effective dimensionality of the problem. Networks that can be taught these verbal responses can learn to solve many problems much faster.
As described above, networks can be taught or programmed to comply with rules, which is only one simple kind of input-output. However, such methods do not work for any other of the myriad kinds of relations in the world, such as: above, in, of, sister of, inside, subclass of, threatens, suggests, is the capital of, etc. ANN language research and knowledge-based systems that accommodate such relations have to explicitly program their processing: they cannot learn new relations from experience as can humans. This is a huge weakness.
Beyond being able to learn many kinds of relations is the challenge of deriving some value from the knowledge. Except for the trivial case of being able to repeat a relational statement, learning it will not be useful unless the agent has also learned how to combine the statement with other relational statements, and ultimately to actions. An agent must explicitly learn how to combine X>Y and Y=Z to conclude that X>Z; and that X>Y and Y<Z does not lead to any conclusion about the relation of X and Y. This essential learning has also not been done with neural networks.
Jameson (1993) has proved that certain kinds of problems cannot be solved without the use of models or representations of the world. Most neural network architectures have no model component and therefore cannot solve such problems. Those that do (e.g., White & Sofge, 1992) require that the model be specified to a significant (and often impossible) degree by the system developer. Verbal behavior permits a system to construct such models.
Obviously, some sources of information are more reliable than others, such that information should be differentially learned, and thereafter differentially relied upon. ANNs are programmed or trained to comply with all advice, or if differential strengths are used, they must be given by the developer rather than learned. If a new statement were then given from a known source, the system should be able to generalize regarding the reliability of the statement from the reliability of previous statements from that source; but existing methods would not handle that case. This capability should go beyond considering the source: Take Einstein's advice about physics but not about economics.
Apart from the differential reliability of statements, they have different degrees of value. It may be perfectly reliable that there are 743 cats in Chanute, Kans., but the value of this knowledge is so low that an agent should not waste resources learning it.