The present invention relates to machine learning and more particularly to training and optimization of autonomous agents embodied in a computer system.
Resource variables are known for use in measuring the use of available resources and in gauging the state of an autonomous agent. However, the art has not been well developed to make use of resource variables, so that the autonomous agent can be trained and become a self learning machine. It has long been recognized that modifying system parameters can change the behavior of a system so it more closely approximates a desired behavior. The inventor has previously developed training processes making use of derivative variables to determine changes in system parameters. The author proposes the use derivative variables of resources variables as a means of training the autonomous agent. The derivative variables of the resource variables along with error in the level of the resource variables will be the mechanism of directing the behavior of the autonomous agent.
A particularly-related patent from the present inventor is U.S. Pat. No. 5,974,434 issued Oct. 26, 1999, entitled xe2x80x9cMethod and Apparatus for Automatically Tuning the Parameters of a Feedback Control System.xe2x80x9d A patent application of the inventor expanding on the earlier work which is related to the present invention is co-pending U.S. patent application Ser. No. 09/692,950 filed Oct. 20, 2000, entitled xe2x80x9cAuto-Tuning Method in a Multiple Input Multiple Output Control Systemxe2x80x9d. This application also refers to other inventions of the present inventor which are peripherally related to the present invention. This patent and this patent application are incorporated herein by reference for all purposes. None of the prior work of the inventor, whether or not considered to be prior art, address and solve the problem of training an autonomous agent.
In a computer system functioning as an autonomous agent with a robotic controller wherein the computer system need not be programmed to perform a specific task, values in an array of error variables are obtained from comparison between 1) an array of resources variables used to indicate resource use and state of an Autonomous Agent and 2) an array of corresponding desired levels for the resource variables are used with the array of actual level of resource variables to produce an array of error variables, and 3) the value of the array of error variables are used with the derivative variables of the resources variables to adjust the parameters in Autonomous Agent so as to minimize the value of the error variables and by so doing train the system. Each individual variable in the array of error variables represents an error in the actual value of a particular resource variable as compared to its desired value. Derivative variables for the resource variables and for the parameters controlling behavior of the Autonomous Agent are created according to the invention for use in the training algorithm.
This patent application discloses that the autotuning algorithm which was previously employed with both a linear system and a nonlinear system to train a process controller can be used to direct the self-learning behavior of an autonomous agent, thus providing a form of self-programming. The objective of the self-programming algorithm of Autonomous Agent is to maintain the resource variables within nominal ranges. The auto-tuning algorithm used on the control system can also be used by the Autonomous Agent to direct its learning behavior and improve its performance.
A critical component of this system is a model of the external environment. This model of the external environment, or External System Model, is basically the template used for the generation of the derivative variables used for training. This External System Model is also an adaptive component whose performance will also be evaluated and have its internal component and parameters adjusted to improve its own performance. The program of the Autonomous Agent can then act on its environment and experiment and improve its own performance based upon an evaluation of the experimental results. A benefit in use of this technique is an increase in its adaptability.
The invention will be better understood by reference to the following detailed description of the invention in connection with the accompanying drawings.