1. Field of the Invention
The present invention is directed to a neural network control system including, in one embodiment, a computer-implemented method and apparatus using a computer-readable medium to control a general-purpose computer to perform intelligent control.
2. Description of the Background
Science has been fascinated by the capabilities of the human mind, and many have hypothesized on the process by which mammalian brains (and human brains in particular) learn. When NSF first set up the Neuroengineering program in 1987, it was not motivated by any kind of desire to learn more about the brain for its own sake. The program was set up as an exercise in engineering, as an effort to develop more powerful information processing technology. The goal was to understand what is really required to achieve brain-like capabilities in solving real and difficult engineering problems, without imposing any constraints on the mathematics and designs except for some very general constraints related to computational feasibility. In a sense, this could be characterized as abstract, general mathematical theory; however, these designs have been subjected to very tough real-world empirical tests, in proving that they can effectively control high-speed aircraft, chemical plants, cars and so onxe2x80x94empirical tests which a lot of xe2x80x9cmodels of learningxe2x80x9d have never been confronted with.
More precisely, the Neuroengineering program began as an offshoot of the Lightwave Technology (LWT) program at NSF. LWT was and is one of the foremost programs in the U.S. supporting the most advanced research in optical technology. It furthers the development and use of advanced optical fibers, lasers, holography, optical interface technology, and so on, across a wide range of engineering applicationsxe2x80x94communication, sensing, computing, recording, etc. Years ago, several of the most advanced engineers in this field came to NSF and argued that this kind of technology could be used to generate computing systems far more powerful than conventional electronic computers.
The desktop computer has advanced remarkably over the computers of twenty years ago. It is called a xe2x80x9cfourth generationxe2x80x9d computer, and its key is its Central Processing Unit (CPU), the microchip inside which does all the real substantive computing, one instruction at a time. A decade or two ago, advanced researchers pursued a new kind of computerxe2x80x94the fifth generation computer, or xe2x80x9cmassively parallel processorxe2x80x9d (MPP) or xe2x80x9csupercomputer.xe2x80x9d The MPP may contain hundreds or thousands of CPU chips, all working in parallel, in one single box. In theory, this permits far more computing horsepower per dollar; however, it requires a new style of computer programming, different from the one-step-at-a-time FORTRAN or C programming that most people know how to use. The U.S. government has spent many millions of dollars trying to help people learn how to use the new style of computer programming needed to exploit the power of these machines.
In the late 1980""s, the optical engineering seemed to be a viable basis for developing a sixth generation of computing, as far beyond the MPP as the MPP is beyond the ordinary PC. Using lasers and holograms and such, it was believed that a thousand to a million times more computing horsepower per dollar could be produced compared to the best MPP. However, although skeptics agreed that optical computing might be able to increase computing horsepower as claimed, it would require a price. Using holograms, huge throughput can be achieved, but very simple operations are required at each pixel of the holograms. This requires replicating very simple operations performed over and over again in a stereotyped kind of way, and the program is not easily replaced like a FORTRAN program can be replaced or changed.
Carver Mead, from CalTech, then pointed out that the human brain itself uses billions and billions of very simple unitsxe2x80x94like synapses or elements of a hologramxe2x80x94all working in parallel. But the human brain is not a niche machine. It seems to have a fairly general range of computing capability. Thus the human brain becomes an existence proof, to show that one can indeed develop a fairly general range of capabilities, using sixth generation computing hardware. The Neuroengineering program was set up to follow through on this existence proof, by developing the designs and programs to develop those capabilities. In developing these designs, advances in neuroscience are used, but they are coupled to basic principles of control theory, statistics and operations research.
However, sometimes terminology clouds advances in one area that are applicable in another area. Some computational neuroscientists have built very precise models that look like neural nets and use little circles and boxes representing differential equations, local processing and so on. Other people use artificial neural nets to accomplish technological goals. Further other scientists, including psychologists, use yet another set of terminology. What is going on is that there are three different validation criteria. In the computational neuroscience people are asking, xe2x80x9cDoes it fit the circuit?xe2x80x9d In connectionist cognitive science they are asking, xe2x80x9cDoes it fit the behavior?xe2x80x9d In our neuroengineering, people are asking, xe2x80x9cDoes it work? Can it produce solutions to very challenging tasks?xe2x80x9d But in actuality, whatever really goes on in the brain has to pass all three tests, not just one. Thus logic suggests a combination of all three validation criteria is needed.
Present models must go beyond the typical test of whether or not a model can produce an associative memory. The bottom line is that a new combination of mathematics is needed.
Most of the engineering applications of artificial neural nets today are applications of a very simple idea called supervised learning, shown in FIG. 2. Supervised learning is a very simple idea: some inputs (X), which are really independent variables, are plugged into a neural network, and a desired response or some target (Y) is output. Some weights in the network, similar to synapse strengths, are adapted in such a way that the actual outputs match the desired outputs, across some range of examples. If properly trained, good results are obtained in the future, when new data is applied to the network. These systems do have practical applications, but they do not explain all the functioning of the brain. To make things work in engineering a few components have to be added, above and beyond cognition. A robot that does not move is not a very useful robot. But even supervised learning by itself does have its uses.
For historical reasons, a majority of ANN applications today are based on the old McCulloch-Pitts model of the neuron, shown in FIG. 3. According to this model, the voltage in the cell membrane (xe2x80x9cnetxe2x80x9d) is just a weighted sum of the inputs to the cell. The purpose of learning is simply to adjust these weights or synapse strengths. The output of the cell is a simple function (xe2x80x9csxe2x80x9d) of the voltage, a function whose graph is S-shaped or xe2x80x9csigmoidal.xe2x80x9d (For example, most people now use the hyperbolic tangent function, tanh.) Those ANN applications which are not based on the McCulloch-Pitts neuron are usually based on neuron models which are even simpler, such as radial basis functions (Gaussians) or xe2x80x9cCMACxe2x80x9d (as described in D. White and D. Sofge, eds., xe2x80x9cHandbook of Intelligent Control,xe2x80x9d published by Van Nostrand, 1992; and W. T. Miller, R. Sutton and P. Werbos (eds), xe2x80x9cNeural Networks for Control,xe2x80x9d published by MIT Press, 1990).
Although in most applications today, the McCulloch-Pitts neurons are linked together to form a xe2x80x9cthree-layeredxe2x80x9d structure, as shown in FIG. 4, where the first (bottom) layer is really just the set of inputs to the network, it is known that the brain is not so limited. But even this simple structure has a lot of value in engineering. Further, there are some other concepts that have arisen based on the study of neural networks: (1) all neural networks approximate xe2x80x9cnicexe2x80x9d functions, (2) a four-layer MLP can be used for limited tracking control, (3) as the number of inputs grow, the MLP does better, and (4) there is a speed versus generalization dilemma. In xe2x80x9cUniversal approximation bounds for superpositions of a sigmoidal function,xe2x80x9d IEEE Trans. Info. Theory 39(3) 930-945, 1993, A. R. Barron showed that a simple three layered MLP can approximate any smooth function, in an efficient way. Most people in engineering today will say that is the end of the story, any smooth function, nothing else is needed. However, this structure is not powerful enough to do all jobs. A broader concept of reinforcement learning is needed.
Reinforcement learning has been a controversial idea in psychology. The reasons for this are very strange. Back in the days of Skinner, he used to say that this idea is too anthropomorphic, that it ascribes too much intelligence to human beings and other animals. Nowadays many people are saying just the oppositexe2x80x94that it""s not purely cognitive enough (because it has motivation in there) and that it""s also too mechanistic. But in reality, it may be a good thing to pursue an idea which is halfway between these two extremes. In any case, the problem here for an engineer is straightforward. Assume there is a little person who has a bunch of levers (labeled ul to un) to control. The set of n numbers forms a vector. Likewise, the person sees a bunch of light bulbs labeled Xl through Xm, representing sensory input. Finally, there is something that looks like a big thermometer which measures utility, U (not temperature). The problem to be solved is as follows: find a computer program or neural net design which can handle the job of the little person in this hypothetical. The little person starts out knowing nothing at all about the connection between the lights, the levers and the thermometer. He must somehow learn how these things work, enough to come up with a strategy that maximizes the utility function U over the long term future. This kind of reinforcement learning is not the same as self-gratification. Although the function U can be thought of as a measure of gratification, the problem here is more like a problem in delayed gratification. The essence of the problem is not just to maximize this in the next instant. The problem is to find a strategy over time to achieve whatever goals are built into this U; these could be very sophisticated goals.
Almost any planning or policy management problem can be put into this framework. An economist would say that this connection is very straightforward. If U is chosen to represent net profits, then the learning task herexe2x80x94to maximize profits over the long-termxe2x80x94encompasses quite a lot. The hypothetical may not be a good higher order description of the brain, but it has been extremely productive as a good first order motivator of engineering research.
There are a few other aspects of reinforcement learning of some importance to understanding the brain. It turns out that a really powerful reinforcement learning system can""t be built if there is only one simple neural net. Modules within modules within modules are needed, which is exciting, because that is also the way the brain is believed to work. This is not like the AI systems where you have an arbitrary kind of hierarchy. Instead, you have a lot of modules because there are a lot of pieces that need to do this kind of task effectively over time. Further, if a real engineering system is built that tries to learn how to do this maximization task over time, then in order to make it work, human-style control has to be added. For example, exploratory behavior appears necessary. Without exploratory behavior, the system is going to get stuck; and it will be a whole lot less than optimal. So there is a lot of behavior that people do which is exploratory. Exploratory behavior is often called irrational, but it appears useful if a human-like control system is to be built.
Another issue is that human beings sometimes get stuck in a rut. There are many names for the ruts that humans get stuck in. Humans get stuck in less than optimal patterns of behavior. Unfortunately, the same thing happens to ANNs as well. They get stuck in things called local minima. If there were a mathematical way to avoid local minima, in all situations, then it would be used. If there were a mathematical way or a circuit way to keep the human brain from getting stuck in a rut, nature would have implemented it too, but there isn""t. It""s just the nature of complex nonlinear systems that in the real world have a certain danger of falling into a local minimum, a rut. A certain amount of exploratory behavior reduces that danger.
The bottom line here is that nobody needs to worry about an engineer building a model so optimal that it is more optimal than the human brain could be. That""s the last thing to worry about, even though reinforcement learning may still be a plausible first-order description of what the brain is doing, computationally.
A neurocontroller will be used hereinafter as a well defined mathematical system containing a neural network whose output is actions designed to achieve results over time. Whatever else is known about the brain as an information processing system, clearly its outputs are actions. And clearly the function of the brain as a whole system is to output actions.
For the brain as a computer, control is its function. To understand the components of a computer, one must understand how they contribute to the function of the whole system. In this case, the whole system is a neurocontroller. Therefore the mathematics required to understand the brain are in fact the mathematics of neurocontrol. Neurocontrol is a subset both of neuroengineering and of control theoryxe2x80x94the intersection of the two fields. The book, xe2x80x9cNeural Networks for Controlxe2x80x9d, discussed supra, came from a workshop back in 1990 and really was the start of this now organized field called neurocontrol. Later followed xe2x80x9cHandbook of Intelligent Control,xe2x80x9d discussed supra, which is still the best place to go to find the core, fundamental mathematics, including all the equations. Also useful as an introduction is xe2x80x9cThe Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting,xe2x80x9d by P. Werbos and published by Wiley, 1994. Basically, it includes tutorials in the back explaining what backpropagation is and what it really does. Backpropagation is a lot more general than the popularized stuff. The book can help explain the basis for designs which use backpropagation in a very sophisticated way. (Also, an abbreviated version of some of this material appears in the chapter on back propagation in P. Werbos, Backpropagation, in M. Arbib (ed) Handbook of Brain Theory and Neural Networks, MIT Press, 1995.)
Since 1992, there has been great progress in applying and extending these ideas. See E. Fiesler and R. Beale, eds, Handbook of Neural Computation, Oxford U. Press and IOP, 1996 for some of the developments in neurocontrol in general. See P. Werbos, Intelligent control: Recent progress towards more brain-like designs, Proc. IEEE, special issue, E. Gelenbe ed., 1996. for a current overview of the more brain-like designs (and of some typographic errors in xe2x80x9cHandbook of Intelligent Controlxe2x80x9d).
Neural networks have found three major uses: (1) copying expert using supervised control, (2) following a path, setpoint, or reference model using direct inverse control or neural adaptive control, and (3) providing optimal control over time using backpropagation of utility (direct) adaptive critics. Thus cloning, tracking and optimization make up the trilogy. Those are the kinds of capabilities that can be used in engineering.
Cloning means something like cloning a preexisting expert, but this is not what the brain does. There is some kind of learning in the brain based on imitating other people, but it""s nothing like the simple cloning designs used in engineering. In fact, imitative behavior in human beings depends heavily on a lot of other more fundamental capabilities which need to be understood first.
Tracking is the most popular form of control in engineering today. In fact, many classical control engineers think that control means tracking, that they are the same thing. This is not true. But a narrowly trained control specialist thinks that control means tracking. An example of tracking is the monitoring of a thermostat. There is a desired temperature, and you want to control the furnace to make the real temperature in the room track the desired setpoint. (The xe2x80x9csetpointxe2x80x9d is the desired value for the variable which you are trying to control.) Or you could have a robot arm, and a desired path that you want the arm to follow. You want to control the motors so as to make the arm fit (track)the desired path. A lot of engineering work goes into tracking. But the human brain as a whole is not a tracking machine. We don""t have anyone telling us where our finger has to be every moment of the day. The essence of human intelligence and learning is that we decide where we want our finger to go. Thus tracking designs really do not make sense as a model of the brain.
FIG. 5 gives a simple-minded example of what is called direct inverse controlxe2x80x94direct tracking. The idea here is very simple: you want the robot hand to go to some point in space, defined by the coordinates x1 and x2. You have control over xcex81 and xcex82. You know that x1 and x2 are functions of xcex81 and xcex82. If the function happens to be invertiblexe2x80x94and that""s a big assumption!xe2x80x94then xcex81 and xcex82 are a function of x1 and x2. So what some robot people have done is as follows; they will take a robot, and flail the arm around a little bit. They will measure the x variables and the xcex8 variables, and then they try to use simple supervised learning to learn the mapping from the x""s to the xcex8""s.
This approach does workxe2x80x94up to a point. If you do it in the obvious way, you get errors of about 3%xe2x80x94too much for anybody to accept in real-world robotics. If you are sophisticated, you can get the error down a lot lower. There are a few robots out there that use this approach. But the approach has some real limitations. One limitation is this assumption that the function has to be invertible; among other things, this requires that the number of xcex8 variables (degrees of freedom) has to be exactly the same as the number of x variables. The other thing is that there is no notion of minimizing pain or energy use. There have been lots of studies by people like Kawato and Uno, and also a lot of work by Mahoney from Cambridge University, who has done work on biomechanics. There is lots and lots of work showing that the human arm movement system does have some kind of optimization capability.
There are lots of degrees of freedom in the human arm, and nature does not throw them out. Nature tries to exploit them to minimize pain, collision damage, whatever. The point is that direct tracking models are simply not rich enough to explain even the lowest level of arm control.
An interesting aspect of this is that there are lots of papers still out there in the biology literature talking about learning the mapping from spatial coordinates to motor coordinates. What I am saying is that this is only a metaphor. It is not a workable system. Perhaps it is useful at times in descriptive analysis, but it would be totally misleading to incorporate it into any kind of model of learning.
In actuality, in neuroengineering, most people do not use direct inverse control, even when they are trying to solve very simple tracking problems. There is another approach called indirect adaptive control, where you try to solve a tracking problem by minimizing tracking error in the next time period. This myopic approach is now extremely popular in neuroengineering. But this approach tends to lead to instabilities in complex real-world situations (using either ANNs or classical nonneural designs). There are lots of theorems to prove that such designs are stable, but the theorems require a lot of conditions that are hard to satisfy.
Because of these instability problems, I don""t think that indirect adaptive control is a plausible model of arm movement either. Furthermore, it still doesn""t account for the work of Kawato and Mahoney and such, who show some kind of optimization capability over time. Therefore, I would claim that optimization over time is the right way to model even the lowest level of motor control.
If you look back at the list of uses for neural networks, you will see that there are two forms of optimization over time which have been used in practice for reasonably large-scale problems in neuroengineering. (There are also a few brute-force approaches used on much smaller-scale problems; these are obviously not relevant here.) One of them is a direct form of optimization based entirely on backpropagation. Direct optimization over time leads to a very stable, high-performance controller. It has been used a whole lot in classical engineering and in neuroengineering both. For example, I suspect that you will see it in ANNs in some Ford cars in a couple of years. Nevertheless, the kind of stuff that you can do in the brain is a little different from what you can do with microchips in a car. The direct form of optimization requires calculations which make no sense at all as a model of the brain. This leaves us with only one class of designs of real importance to neurosciencexe2x80x94a class of designs which has sometimes been called reinforcement learning, sometimes called adaptive critics, and sometimes called approximate dynamic programming (ADP). Actually, these three terms do have different histories and meanings; in a strict sense, the designs of real relevance are those which can be described either as adaptive critics or as ADP designs.
The kind of optimization over time that I believe must be present in the brain is a kind that I would call approximate dynamic programming (ADP). There is only one other kind of optimization over time that anybody uses (the direct approach), and that""s not very brain-like. So this is the only thing we have left. But what is dynamic programming?
Dynamic programming is the classic control theory method for maximizing utility over time. Any control theorist will tell you that there is only one exact and efficient method for maximizing utility over time in a general problem and that is dynamic programming. FIG. 6 illustrates the basic idea of dynamic programming. The incoming arrows represent the two things that you have to give to dynamic programming before you can use it. First, you must give it the basic utility function U. In other words, you must tell it what you want to maximize over the long-term future. This is like a primary reinforcement signal, in psychology. Second, you have to feed it a stochastic model of your environment. And then it comes up with another function called a strategic utility function, J.
The basic theorem in dynamic programming is that this J function will always exist if you have a complete state model. Maximizing J in the short term will give you the strategy which maximizes U in the long term. Thus dynamic programming translates a difficult problem in planning or optimization over time into a much more straightforward problem in short term maximization.
If dynamic programming can solve any optimization problem over time, and account for all kinds of noise and random disturbance, then why don""t we use it all the time? The real answer is very simple: it costs too much to implement in most practical applications. It requires too many calculations. To run dynamic programming on a large problem is too expensive. It just won""t work. But there is a solution to that problem, called approximation.
In Approximate Dynamic Programming (ADP), we build a neural net or a model to approximate this function J. Thus instead of considering all possible functions J, we do what you do if you are an economist building a prediction model. You build a structure with some parameters in it and you try to adapt the parameters to make it work. You specify a model or a network with weights in it, and you try to adapt the weights to make this a good approximation to J. A neural network which does that is called a Critic network. And if it adapts over time, if it learns, we call it an adaptive critic. So right now in engineering we have almost three synonyms. Approximate dynamic programming, adaptive critics, and reinforcement learningxe2x80x94those are almost the same thing.
Based on all of this logic, I would conjecture that the human brain itself must essentially be an adaptive critic system. At first glance, this may sound pretty weird. How could there be dynamic programming going on inside the brain? What would this idea mean in terms of folk psychology, our everyday experience of what it feels like to be human? A good model of the brain should fit with our personal experience of how the brain really works. That""s part of the empirical data. We don""t want to ignore it. So does this theory make sense in terms of folk psychology? I will argue that it does. I would like to give you a few examples of where this J versus U duality comes in, in different kinds of intelligent behavior.
Those of you who have followed artificial intelligence (AI) or chess playing probably are aware that in computer chess the basic goal, the U, is to win the game, and not to lose it. This is in computer chess, not in real chess, in computer chess. But there is a little heuristic they teach beginners. They teach you that a queen is worth 9 points, a castle is worth 5, and so on. You can compute this kind of score on every move. This score has nothing to do with the rules of the game. But people have learned that if you maximize your score in the short term, that""s the way to win in the long term.
When you get to be a good chess player, you learn to make a more accurate evaluation of how well you are doing. For example, you learn to account for the value of controlling the center of the board, regardless of how many pieces you have. Studies suggest that the very best chess players are people who do really sophisticated stuff, a really high quality strategic analysis of how good their position is one move ahead. Those are the studies I""ve seen. So basically, this evaluation score is like a J function. It""s a measure of how well you are doing.
In animal learning, U is like primary reinforcement, the inborn kind of stuff. It reminds me of the hypothalamus and the epithalamus. And J is like secondary reinforcement, the learned stuff, learned reinforcers. U is like pleasure or pain, an automatic kind of response, while J is like hope and fear. And in a way all of this fancy theory is just saying hey, I think hope and fear is hard-wired into the brain. We respond to hopes and fears from day one. Hopes and fears drive everything we do and learn.
It turns out that this model also has parallels in physics. In fact, the Bellman equation we use in dynamic programming is exactly what is called the Hamilton-Jacobi equation in physics. If you read Bryson and Ho, Applied Optimal Control, Ginn, 1969, they even call it the Hamilton-Jacobi-Bellman equation. In physics, they would say that the universe is maximizing a Lagrangian function instead of calling it a utility function; thus they use the letter L instead of the letter U, but it""s the same equation. And it turns out that our J refers to something they call xe2x80x9caction.xe2x80x9d And the things we call xe2x80x9cforcesxe2x80x9d in physics turn out to be the gradient of the J function. (See F. Mandl, Introduction to Quantum Field Theory, published by Wiley, 1959; and V. G. Makhankov, Yu. P. Rybakov and V. I. Sanyuk, The Skyrme Model: Fundamentals, Methods, Applications, published by Springer-Verlag (800-777-4643), 1993.)
It is an object of the present invention to address at least one deficiency in the intelligent control of external devices by using a new brain-like control system.