The invention pertains to systems and methods that incorporate machine learning and automatic adaptation to respond to changing environmental conditions. More particularly, the invention pertains to such systems and methods that incorporate genetic algorithms, learning classifier systems and agent technology in the realization of a complex adaptive system.
Information systems have grown to comprise such a large number of subsystems that direct centralized control is often very difficult, if not impossible. In actual use, the systems change, new situations develop in the system environment, various new components are added, and the relationships between existing components may change. This is typical of a real-world system, and such a system is called a complex adaptive system, or CAS.
Structured and object-oriented software techniques provide many useful paradigms for software development in a CAS. However, agent-based systems are superior in that system intelligence is contained in locally encapsulated software entities, or agents, and is thus portable and dedicated to given tasks. Individual agents can adapt without changing the rest of the system. An agent is a self-contained entity in a CAS. Each agent is able to accomplish given tasks within a software environment.
The use of multiple agents allows interaction and parallel execution of multiple tasks in diverse locations throughout the system. The intrinsic parallelism provides enhanced system robustness.
Because a CAS is continually changing, it is necessary to provide the agents with a form of learning, to allow them to adapt on-line to the changing environment. Using the principles of evolution, the agents can learn to accomplish their tasks, and continually work to improve their performance, and hence their fitness, to better survive in the current environment. Various solutions are built and tested, and the better solutions are kept and combined with other good solutions to attempt to continually improve the performance.
One known approach, a learning classifier-system, was a machine-learning technology that attempted to learn optimum job performance in a given environment. This was a rule-based system containing multiple rules, and was capable of generating new rules as needed. This type of system was capable of creating rule-chains, by allowing time-delayed rewards to influence the fitness of earlier rules in the chain, and by passing part of the reward earlier in the chain. It incorporated a genetic algorithm, to create new rules, and to evolve the existing rules through rule crossover and mutation techniques.
The rules contained in the known system consisted of fixed-length strings of binary digits, requiring external interpretation to express their meaning. Through the use of binary digits, with the addition of a wild card symbol, hierarchies of rules could be evolved, where one rule could be applicable in multiple situations. Because the known system was constrained to operate on binary strings, it was difficult to add new information into the system. The interpretation of each of the binary digits in the string would have to be modified to accommodate the new information, and the string length would change, all increasing complexity.
Genetic programming evolved to create more text-based evolved solutions to problems. Instead of using a binary string, genetic programming combined functions and terminals, which are represented by words, into a string that is genetically evolved. The functions and terminals can be combined in a type of Lisp or tree expression. As known, genetic programming determines only a single solution from the evolutionary process. In contrast, learning classifier systems need multiple actions to be maintained in the knowledge base.
Thus, there is a need to combine properties of learning classifier systems and genetic programming to create an adaptable system that automatically learns optimum behavior. Preferably such a system would use the terms and operations of the actual environment to make up its rules, while maintaining a diverse rule base.
It would be preferred if it would be unnecessary to code software manually based on learned information. Preferably, such systems would incorporate online adaptation and response to environmental changes. Such adaptation would minimize down time or sub-optimal functioning while awaiting new software.
It would also be preferred if manual software changes would be unnecessary to modify system behavior or to respond to changes in the environment or the system being controlled. In this regard, distributed processing as opposed centralized control minimizes problems due to losses of portions of the system resident at remote nodes. It would also be preferable if it would be unnecessary to have a human maintain and update the software manually. This avoids ongoing needs to understand the software, modify tests and integrate the modified software.
A system which embodies this invention enables a plurality of intelligent agents to learn to accomplish tasks in their environment and to adapt to changes in the environment. The agents each have a substantially similar structure and incorporate a plurality of pre-stored rules.
Each rule has an associated fitness or success measure. This measure can be dynamically increased and decreased in response to effectiveness in carrying out a function.
Agents incorporate an internal message list. This list can be accessed by the rules. The messages can be allocated by, for example, an auction process.
Agents also have access to an external message list at each site. Messages are received and auctioned to the agent population. Funds are collected from successful bidders and distributed, at least in part, to the agent or agents which contributed the auctioned messages.
In one embodiment, the intelligent agents exist in an artificial economy where information is bought and sold. The fitness of agents for a certain task is based upon virtual funds they have accumulated over time.
The funds a certain agent possesses are related to how well it performed jobs that is has attempted. Rewards are provided in an ongoing manner as jobs progress, allowing an agent to learn which actions to take to accomplish each job.
Agents that perform the jobs correctly and efficiently are rewarded more than agents that do not. Agents that accumulate more funds or fitness indicia are used to populate the environment, through cloning, and evolutionary techniques such as simulated sexual reproduction. In this simulation process, new agents are created from two parent agents with the parent agents being selected by their level of fitness.
In another embodiment, the environment can contain multiple sites where agents can execute their commands, communicate with other agents, and interface to the outside world. Each site is a location on a computer that supports the existence and operation of agents. The site contains message boards that are available for use by the agents. A site serves as an address for agents to be used when agents traverse the network.
Sites are interconnected, such that agents can be dispatched from one site, traverse the network, and reappear at another site. Sites may be located on the same computer, or distributed geographically to distant locations connected through a digital communications system. The conglomeration of all sites and agents that exist at those sites is called the environment.
Sites can contain an interface agent that provides a vehicle for agents to communicate with software systems that are external to the agent site. These external software systems may consist of databases, messaging/communication systems for human users, or other pre-existing software applications.
Instead of using binary strings, in another aspect, strings of high-level computer language words that are sequenced under syntax rules and structured in the form of IF-THEN statements. Irrespective of language, the present rules have a two part structure wherein antecedents must be satisfied before any consequent can result.
In another embodiment, software is combined, using simulated genetic crossover and mutation operations, into the structure of a rule or rules. These rules can be combined into a population of rules. The best performing rules are combined to make new rules. The use of high-level rules allows easy insertion of new information to the system. In addition, it removes the need to interpret the bits in a binary string in order to execute the program.
In yet another aspect of the invention, a method is provided for covering messages that were on a message board but not acquired. New rules can be created which contain new information from the environment. In yet another embodiment, genetic code can be modified by adding new functions into antecedents and consequents.
A FEHN array can be used to determine the structural and functional closeness between chromosomes. Hence, a new rule can be compared to an existing rule by comparing the FEHN array representative of their gene graphs. Crowding can be used to select a rule of lesser effectiveness for replacement while maintaining diversity.
An apparatus which embodies the present invention automatically learns how to control a system and performs continual optimization of the control of that system or process. Exemplary control systems include machinery, electronics, software interfaces, as well as processes of all types. Control sequences and settings for the system operation are automatically defined.
In another aspect of the invention, starting from a set of software primitives, a program that implements certain desired functions in the particular application can be constructed, tested and maintained. The automatic learning capability of the present invention facilitates learning how to most appropriately control a respective system and how to change internally to respond to changes in the environment.
In yet another aspect of the invention, online adaptation makes it possible to attempt to adapt to changes in the environment, the controlled system, over a period of time.
In software implemented embodiments, new software is automatically constructed to perform a required job or carry out a function using software function calls and appropriate data associated with the respective subject matter domain. New elements which are injected into the domain are automatically incorporated into the solution.
Another advantage of control systems which embody the present invention is that they are more recoverable and survivable in view of the fact that the agents are distributed throughout the network and can travel between sites. If one site gets isolated from the rest of the network local agents at the isolated site can develop capabilities to perform needed local processing. Agents capabilities that may have been lost when the site went down can then be automatically re-established by the learning system by growing new agents programs.
Other advantages grow out of self-maintainability. Control systems which embody the present invention continually monitor their performance in relation to a task. They continually try to improve their performance. If new information becomes available, it can be automatically incorporated. If a performance dip occurs, one of the advantages of the present invention is that the respective control system will try other avenues of system operation to attempt to increase performance.
The present system has the further advantage of performing continual optimization. Even after a satisfactory performance level has been reached, the system will continue to try other approaches to accomplish a required job. As newer and/or better solutions become available they will automatically be implements. Such improvements include reduced costs, processing time or network usage.
Another advantage of an embodiment of the present invention grows out of the fact that such embodiments perform genetic evolution using software syntax words instead of digital data bit strings. As a result, the present invention can be more easily applied to a given problem. An implementer does not need to translate information from the problem domain into coded bits. Instead, software functions and data that exist can be directly used. Maintenance is facilitated because there are no bit strings which have to be changed with associated coding or decoding. Additional information can be automatically added using an extensible, tree-based data structure.