The present invention relates-in general to artificial intelligence systems and in particular to a new and useful device which combines artificial neural network (xe2x80x9cANNxe2x80x9d) learning techniques with fuzzy logic techniques.
Both neural network learning techniques and fuzzy logic techniques are known. In fact, prior combinations of the two techniques are known as well, as for example U.S. Pat. No. 5,179,624 issued Jan. 12, 1993 to Amano (xe2x80x9cSpeech recognition apparatus using neural network and fuzzy logicxe2x80x9d), which is incorporated herein by reference.
Both techniques attempt to replicate or improve upon a human expert""s ability to provide a response to a set of inputs. ANNs extract knowledge from empirical databases used as training sets and fuzzy logic usually extracts rules from human experts.
In very brief summary, neural network techniques are based on observation of what an expert does in response to a set of inputs, while fuzzy logic techniques are based on eliciting what an expert says he will do in response to a set of inputs. Many authors, including Applicant, have recognized the potential value of combining the capabilities of the two techniques.
Applicant is the author of Chapters 3, 10 and 13 of D. White and D. Sofge, Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches, Van Nostrand, 1992, (xe2x80x9cHICxe2x80x9d), which was published no earlier than Sep. 1, 1992 and which contains disclosure of a number of novel inventions which will be summarized and claimed herein. The entirety of those chapters are incorporated herein by reference.
The invention described and claimed herein comprises an Elastic Fuzzy Logic (xe2x80x9cELFxe2x80x9d) System in which classical neural network learning techniques are combined with fuzzy logic techniques in order to accomplish artificial intelligence tasks such as pattern recognition, expert cloning and trajectory control. The ELF system may be implemented in a computer provided with multiplier means and storage means for storing a vector of weights to be used as multiplier factors in an apparatus for fuzzy control. The invention further comprises novel techniques and apparatus for adapting ELF Systems and other nonlinear differentiable systems and a novel gradient-based technique and apparatus for matching both predicted outputs and derivatives to actual outputs and derivatives of a system.
NEURAL NETWORKS
Artificial Neural Networks (xe2x80x9cANNsxe2x80x9d) are well known, and are described in general in U.S. Pat. No. 4,912,654 issued Mar. 27, 1990 to Wood (xe2x80x9cNeural networks learning methodxe2x80x9d) and in U.S. Pat. No. 5,222,194 issued Jun. 22, 1993 to Nishimura (xe2x80x9cNeural network with modification of neuron weights and reaction coefficientxe2x80x9d), both of which are incorporated herein by reference.
ANNs typically are used to learn static mappings from an xe2x80x9cinput vector,xe2x80x9d X, to a xe2x80x9ctarget vector,xe2x80x9d Y. The first task is to provide a training setxe2x80x94a databasexe2x80x94that consists of sensor inputs (X) and desired actions (y or u). The training set may, for example, be built by asking a human expert to perform the desired task and recording what the human sees (X) and what the human does (y). Once this training set is available, there are many neural network designs and learning rules (like basic backpropagation) that can learn the mapping from X to y. Given a training set made up of pairs of X and y, the network can xe2x80x9clearnxe2x80x9d the mapping by adjusting its weights so as to perform well on the training set. This kind of learning is called xe2x80x9csupervised learningxe2x80x9d or xe2x80x9csupervised controlxe2x80x9d. Advanced practitioners of supervised control no longer think of supervised control as a simple matter of mapping X(t), at time t, onto y(t). Instead, they use past information as well to predict y(t).
Broadly speaking, neural networks have been used in control applications:
1. As subsystems used for pattern recognition, diagnostics, sensor fusion, dynamic system identification, and the like;
2. As xe2x80x9cclonesxe2x80x9d which learn to imitate human or artificial experts by copying what the expert does;
3. As xe2x80x9ctrackingxe2x80x9d systems, which learn strategies of action which try to make an external environment adhere to a pre-selected reference model.
(4) As systems for maximizing or minimizing a performance measure over time. For true dynamic optimization problems, there are two methods of real use: (1) the backpropagation of utility (which may be combined with random search methods); (2) adaptive critics or approximate dynamic programming. The backpropagation of utility is easier and more exact, but it is less powerful and less able to handle noise. Basic backpropagation is simply a unique implementation of least squares estimation. In basic backpropagation, one uses a special, efficient technique to calculate the derivatives of square error with respect to all the weights or parameters in an ANN; then, one adjusts the weights in proportion to these derivatives, iteratively, until the derivatives go to zero. The components of X and Y may be 1""s and 0""s, or they may be continuous variables in some finite range. There are three versions of backpropagating utility: (1) backpropagating utility by backpropagation through time, which is highly efficient even for large problems but is not a true real-time learning method; (2) the forward perturbation method, which runs in real time but requires too much computing power as the size of the system grows; (3) the truncation method, which fails to account for essential dynamics, and is useful only in those simple tracking applications where the resulting loss in performance is acceptable. D. White and D. Sofge, Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches, Van Nostrand, 1992, (xe2x80x9cHICxe2x80x9d) describes these methods in detail and gives pseudocode for xe2x80x9cmain programsxe2x80x9d which can be used to adapt any network or system for which the dual subroutine is known. The pseudocode for the ELF and F_ELF subroutines provided below may be incorporated into those main programs (though the F_X derivatives need to be added in some cases).
Backpropagation cannot be used to adapt the weights in the more conventional, Boolean logic network. However, since fuzzy logic rules are differentiable, fuzzy logic and backpropagation are more compatible. Strictly speaking, it is not necessary that a function be everywhere differentiable to use backpropagation; it is enough that it be continuous and be differentiable almost everywhere. Still, one might expect better results from using backpropagation with modified fuzzy logics, which avoid rigid sharp corners like those of the minimization operator.
One widely used neural network (a multi-layer perceptron) includes a plurality of processing elements called neural units arranged in layers. Interconnections are made between units of successive layers. A network has an input layer, an output layer, and one or more xe2x80x9chiddenxe2x80x9d layers in between. The hidden layer is necessary to allow solutions of nonlinear problems. Each unit is capable of generating an output signal which is determined by the weighted sum of input signals it receives and a threshold specific to that unit. A unit is provided with inputs (either from outside the network or from other units) and uses these to compute a linear or non-linear output. The unit""s output goes either to other units in subsequent layers or to outside the network. The input signals to each unit are weighted either positively or negatively, by factors derived in a learning process.
When the weight and threshold factors have been set to correct levels, a complex stimulus pattern at the input layer successively propagates between hidden layers, to result in an output pattern. The network is xe2x80x9ctaughtxe2x80x9d by feeding it a succession of input patterns and corresponding expected output patterns; the network xe2x80x9clearnsxe2x80x9d by measuring the difference (at each output unit) between the expected output pattern and the pattern that it just produced. Having done this, the internal weights and thresholds are modified by a learning algorithm to provide an output pattern which more closely approximates the expected output patter, while minimizing the error over the spectrum of input patterns. Neural network learning is an iterative process, involving multiple xe2x80x9clessonsxe2x80x9d.
In contrast, some other approaches to artificial intelligence, i.e., expert systems, use a tree of decision rules to produce the desired outputs. These decision rules, and the tree that the set of rules constitute, must be devised for the particular application. Expert systems are programmed, and generally cannot be trained easily. Because it is easier to construct examples than to devise rules, a neural network is simpler and faster to apply to new tasks than an expert system.
FUZZY CONTROL
Fuzzy logic or fuzzy control is also known and is described in general in U.S. Pat. No. 5,189,728 issued Feb. 23, 1993 to Yamakawa (xe2x80x9cRule generating and verifying apparatus for fuzzy controlxe2x80x9d), which is incorporated herein by reference.
In conventional fuzzy control, an expert provides a set of rulesxe2x80x94expressed in wordsxe2x80x94and some information about what the words in the rules mean. Fuzzy control then is used to translate information from the words of an expert into a simple network with two hidden layers, as described in detail in Yasuhiko Dote, xe2x80x9cFuzzy and Neural Network Controllersxe2x80x9d, in Proceedings of the Second Workshop on Neural Networks, Society for Computer Simulation, 1991. Briefly, the expert knows about an input vector or sensor vector, X. He knows about a control vector u. He uses words (xe2x80x9csemantic variablesxe2x80x9d) from the set of words Ai through Am when describing X. He uses words from the set Y1 through Yn when describing u. He then provides a list of rules which dictate what actions to take, depending on X. A generic rule number i would take the form:
If Ai1 and Ai2 and . . . Ain then Ymixe2x80x83xe2x80x83(1)
To make these rules meaningful, he specifies membership functions xcexc(x) and xcexc(u) which represent the degree to which the vectors X and u have the properties indicated by the words Ai and Yj. Typically, a given word Ai appears in several different rules. This information from the expert is translated into a two-hidden-layer network as follows.
The set of input words across the entire system are put into an ordered list. The first word may be called A1, the second A2, and so on, up to the last word, An. The rules also form a list, from rule number 1 to rule number R. For each rule, the rule number j, one must look up each input word on the overall list of words A1; thus if xe2x80x9cBxe2x80x9d is the second word in rule number j, then word B should appear as Ak on the overall list, for some value of k. one may define xe2x80x9cij,2xe2x80x9d as that value of k. More generally, one may define ij,n as that value of k such that Ak matches the nth input word in the rule number j. Using this notation, rule number j may be expressed as:
xe2x80x83If Aij,1 and Aij,2 and . . . and Aij,nj then do uxe2x80x2(j)xe2x80x83xe2x80x83(2)
where nj is the number of input words in the rule number j, and where uxe2x80x2(j) refers to uxe2x80x2(D) for the verb D of rule number j.
The first hidden layer is the membership layer:
xi=xcexc(x) i=1, . . . , mxe2x80x83xe2x80x83(3)
The next hidden layer is the layer of rule-activation, which calculates the degree to which rule number j applies to situation X:
Ri=xi1*xi2* . . . *ximxe2x80x83xe2x80x83(4)
The output layer is the simple xe2x80x9cdefuzzificationxe2x80x9d rule used in most practical applications, and described in Yasuhiko Dote, supra:                                           u            _                    =                                    ∑                                                R                  i                                ⁢                                                      u                    _                                    xe2x80x2                                                                    ∑                              R                i                                                    ,                  
                ⁢        where                            (        5        )                                          ∑          R                =                  1          /                                    ∑              i                        ⁢                          R              i                                                          (        6        )            
None of these equations contains any adjustable weights or parameters; therefore, there is no way to use the methods of neurocontrol on such a system directly.
Equations 3 through 6 can be expressed in pseudocode:
SUBROUTINE FUZZ(x,X);
REAL u(n), X(m), x(na), R(r), RSIGMA, uprime(n,r), running_product, running_sum;
REAL FUNCTION MU(i,X);
INTEGER j,k,l,nj(r), i(r,na)
/* First implement equation 3. Use k instead of i for computer.*/
FOR k=1 TO na;
x(k)=MU(k,X);
/* Next implement equation 4.*/
FOR j=1 TO r:
running_product=1;
FOR k=1 TO nj(r);
running_product=running_product*x(I(j,k));
end;
R(j)=running_product;
/* Next implement equation 6 */
running_sum=0;
FOR j=1 TO R:
RUNNING_sum=running_sum+R(j);
RSIGMA=1/running_sum;
/* Next implement equation 5*/
FOR k=1 to n;
running_sum=0;
FOR j=1 to r;
running_sum=running_sum+R(j)*uprime(k,j);
end;
u(k)=running_sum*RSIGMA;
end;
end;
The subroutine above inputs the sensor array X and outputs the control array u. The arrays uprime and i and the function MU represent uxe2x80x2(j), ij,k and the set of membership functions, respectively; they need to be generated in additional, supplementary computer code.
In addition to adapting weights, the neural network literature also includes methods, described in detail in HIC, for adding and deleting connections in a network. Applied to fuzzy systems, these methods would translate into methods for changing rules by adding or deleting words, or even adding new rules. However, those methods generally assume the presence of adaptable weights.
Nevertheless, equations 3 through 6 can be differentiated, in most cases; therefore, it is still possible to backpropagate through the network, using the methods given in HIC. This makes it possible to use conventional fuzzy systems as part of a neurocontrol scheme; however, neurocontrol cannot be used to adapt the fuzzy part itself.
While useful, this technique has limitations. It does not work well for tasks which require that an expert develop a sense of dynamics over time, based an understanding of phenomena which are not directly observed. A design which is based on static mapping from X(t) to u(t) cannot adequately capture the behavior of the human expert in that kind of application.
Furthermore, the most common version of adaptable fuzzy logic is based on putting parameters into the membership functions rather than the rules. This has two disadvantages.
First, changing the membership function, changes the definition of the word A. Thus the system is no longer defining words in the same way as the expert. This could reduce the ability to explain to the expert what the adapted version of the controller is doing, or even what was changed in adaptation.
Second, changing the membership functions does not allow changing the rules themselves; thus the scope for adaptation is very limited.
PRIOR ATTEMPTS TO COMBINE NEURAL NETWORKS WITH FUZZY LOGIC
There are many ways to combine neural network techniques and fuzzy logic for control applications, described in detail in Paul Werbos, xe2x80x9cNeurocontrol and Fuzzy Logic; Connections and Designs,xe2x80x9d International Journal on Approximate Reasoning, Vol. 6, No. 2, Feb. 1992, p.185. For example, one can use fuzzy logic to provide an interface between the statements of human experts and a controller; neural network techniques can adapt that same controller to better reflect what the experts actually do or to improve performance beyond that of the human.
In the current literature, many people are using fuzzy logic as a kind of organizing framework, to help them subdivide a mapping from X to Y into simpler partial mappings. Each one of the simple mappings is associated with a fuzzy xe2x80x9crulexe2x80x9d or xe2x80x9cmembership function.xe2x80x9d ANNs or neural network learning rules are used to actually learn all of these mappings. There are a large number of papers on this approach, reviewed by Takagi, Takagi, H., Fusion technology of fuzzy theory and neural networks, Proc. Fuzzy Logic and Neural Networks, Izzuka, Japan, 1990. However, since the ANNs only minimize error in learning the individual rules, there is no guarantee that they will minimize error in making the overall inference from X to Y. This approach also requires the availability of data in the training set for all of the intermediate variables (little R) used in the partial mappings.
A paper submitted to The Journal of Intelligent and Fuzzy Systems by Applicant (Elastic Fuzzy Logic: A Better Fit With Neurocontrol), and awaiting publication shows how a modified form of fuzzy logicxe2x80x94elastic fuzzy logicxe2x80x94should make this hybrid approach much more powerful, allowing the full use of the many methods now available in neurocontrol. A copy of the paper is incorporated herein by reference and is attached as FIG. 4.
The basic idea is to use fuzzy logic as a kind of translation technology, to go back and forth between the words of a human expert and the equations of a controller, classifier, or other useful system. One can then use neural network methods to adapt that system, so as to improve performance.
Other researchers have proposed something like ELF, but without the xcex3ij exponents. These exponents play a crucial role in adapting the content of each rule; therefore, they are crucial in providing more complete adaptability.
An advantage of ELF is the ability to explain the adapted controller back to the expert. The xcex3j,0 parameters can be reported back as the xe2x80x9cstrengthxe2x80x9d or xe2x80x9cdegree of validityxe2x80x9d of each rule. The parameters xcex3j,k can be described as the xe2x80x9cimportancexe2x80x9d of each condition (input word) to the applicability of the rule. In fact, if the parameters xcex3j,k are thought of as the xe2x80x9celasticitiesxe2x80x9d used by economists; the whole apparatus used by economists to explain the idea of xe2x80x9celasticityxe2x80x9d can be used here as well.
Another advantage of ELF is the possibility of daptive adding and pruning of rules, and of words without words. When y parameters are near zero, then the corresponding word or rule can be removed. This is really just a special case of the general procedure of pruning connections and neurons in neural networksxe2x80x94a well-established technique. Likewise, new connections or rules could be tested out safely, by inserting them with Y""s initialized to zero, and made effective only as adaptation makes them different from zero. In summary, neural network techniques can be used with ELF nets to adapt the very structure of the controller.
Other authors have suggested putting weights into the membership functions, but this does not provide as much flexibility as one needs for true adaptation, in most applications. In most applications, one needs to find a way to modify the rules themselves. (Modifying the membership functions is sometimes desirable, but it is not the same as modifying the rules, becausexe2x80x94for examplexe2x80x94a given word usually appears in several rules; each rule needs to be modifiable independently.)
An object of the present invention is to provide a new and useful apparatus which can provide more powerful methods for artificial intelligence applications.
A further object of the invention is to provide a tool for artificial intelligence applications which allows weighting the importance of various factors without weighting the membership functions.
A further object of the invention is to provide a means which is a framework for communication between an expert and a computer model which retains a format and vocabulary readily understandable by a human expert.
A further object of the invention is to provide a means for providing the flexibility to introduce factors at the outset of analysis, without knowing whether they will turn out to be relevant or not, in a manner which permits deleting them without undue complication should they turn out to be non-relevant.
A further object of the invention is to provide an intuitive means for communicating to a human expert the importance which a computer model attaches to a particular rule.
These and other objects may be accomplished by means of a central processing unit incorporating dual subroutines. These and other objects may also be accomplished by introducing a weighting means of a multiplicative form, which may be conceptualized mathematically by replacing equation 2 above by:                               R          i                =                              γ                          i              o                                *                      x                          i              1                                                      γ                ⁢                                  xe2x80x83                                                            i                ,                1                                              *                      x                          i              2                                      γ                              i                ,                2                                              *          …          *                      x                          i              m                                      γ                              j                ,                m                                                                        (        7        )            
and defining the weights in the network as the combined sets of parameters xcex3 and vectors uxe2x80x2. This has the advantage of allowing the translation the words of an expert into a network as before, simply by initializing all the xcex3 parameters to one. A feature of the system is the resultant natural way to report the results. The modified uxe2x80x2 vectors can be reported out directly and reported in terms of their fit to the words xcex3i. The xcex3 coefficients can be described as xe2x80x9celasticities,xe2x80x9d as measures of the degree of importance of the semantic variable to the applicability of the rule. Elasticity coefficients have been widely used in economics, and can be understood very easily intuitively, by people with limited knowledge of mathematics. Thus, while elastic fuzzy logic makes it easy, as before, to translate back and forth between a human expert and a network, unlike the conventional logic, it also makes it possible to carry out truly major adaptations of the network using neural network methods. This kind of adaptation makes it easy as well to modify rules as part of the adaptation; for example, words with an elasticity near zero can be deleted from a rule, and new words can be added to a rule in a safe way by initializing their elasticity to zero.
The various features of novelty which characterize the invention are pointed out with particularity in the claims annexed to and forming a part of this disclosure. For a better understanding of the invention, its advantages and objects, reference is made to the accompanying drawings and descriptive matter in which a preferred embodiment of the invention is illustrated.