The invention relates generally to data mining, and more particularly, to the use of evolutionary algorithms to extract useful rules or relationships from a data set for use in controlling systems.
In many environments, a large amount of data can be or has been collected which records experience over time within the environment. Despite the large quantities of such data, or perhaps because of it, deriving useful knowledge from such data stores can be a daunting task. The process of extracting patterns from such data sets is known as data mining. Many techniques have been applied to the problem, but the present discussion concerns a class of techniques known as genetic algorithms, and their superset, evolutionary algorithms.
The basic elements of an evolutionary algorithm are an environment, a model for a genotype (referred to herein as an “individual”), a fitness function, and a procreation function. An environment may be a model of any problem statement. An individual may be defined by a set of rules governing its behavior within the environment. A rule may be a list of conditions followed by an action to be performed in the environment. A fitness function may be defined by the degree to which an evolving rule set is successfully negotiating the environment. A fitness function is thus used for evaluating the fitness of each individual in the environment. A procreation function generates new individuals by mixing rules with the fittest of the parent individuals. In each generation, a new population of individuals is created.
At the start of the evolutionary process, individuals constituting the initial population are created, by putting together the building blocks, or alphabets, that form an individual. In genetic programming, the alphabets are a set of conditions and actions making up rules governing the behavior of the individual within the environment. Once a population is established, it is evaluated using the fitness function. Individuals with the highest fitness are then used to create the next generation in a process called procreation. Through procreation, rules of parent individuals are mixed, and sometimes mutated (i.e., a random change is made in a rule) to create a new rule set. This new rule set is then assigned to a child individual that will be a member of the new generation. In some incarnations, known as elitist methods, the fittest members of the previous generation, called elitists, are also preserved into the next generation.
After testing, before harvesting, surviving individuals conventionally are often subjected to validation against an out-of-sample (i.e., previously unseen) test set. This is to ensure sufficient generalization of the evolved individuals, which may have data-fitted the testing samples.