This invention relates generally to the field of set merging, and more specifically to a method of implementing a set merging function as an array of cells for use in a genetic algorithm machine.
Although evolutionary computing has roots as far back as the 1950s, genetic algorithms (hereinafter referred to by the initials GA) were introduced in 1975 by John Holland as a method for finding an optimum or near optimum solution to complicated problems. As noted by another researcher, Grefenstette, the GA is a useful method for finding optimum or near optimum solutions to the Traveling Salesman Problem, a classic and well-known computationally intractable problem.
With reference now to FIG. 1, there is illustrated therein a conceptual model of a genetic algorithm and how a solution to a problem evolves in processing the GA, generally designated by the reference numeral 100. As is understood in this art, in a genetic algorithm, an emulated chromosomal data structure is initially designed to represent a candidate or trial solution. A number of chromosomes of that data structure are then randomly generated and are registered in groups or populations of solutions. Parent chromosomes are selected from this population of generated chromosomes according to a given algorithm, e.g., selected chromosomes 105 and 110 in FIG. 1. Each generated chromosome is assigned a unique problem-specific fitness which may or may not differ from other chromosomes in the population, identifying the solution quality of the chromosome. The problem-specific fitness is expressed by a fitness value, as is known in the art. In a true evolutionary, survival of the fittest manner, particular chromosomes are selected from the population of chromosomes in proportion to their fitness values with more-fit chromosomes having a higher probability of being selected.
As further illustrated in FIG. 1, when a pair of parent chromosomes, e.g., chromosomes 105 and 110, are selected from the population, the parent chromosomes are combined using a probabilistically generated cut point, designated by the reference numeral 120. In the case of having no cutpoint generated, either of the parent chromosomes is simply copied to provide a new chromosome as a child chromosome. Thus, a child chromosome is created and outputted. The child chromosome, therefore, contains portions of each parent or the whole portion of a parent, e.g., a child chromosome 125 contains portion 105A of parent chromosome 105 and portion 110B of parent chromosome 110, as illustrated in FIG. 1. The child chromosome may then be mutated in a controlled manner, preferably having a low probability. In the evolutionary example illustrated in FIG. 1, the mutation is performed through inversion of a bit 130 in the child chromosome 125, e.g., 0 to 1 or 1 to 0. A mutated child chromosome 125xe2x80x2 is then evaluated to be assigned its fitness value. An evaluated child chromosome along with its fitness value is then stored as a member of the next generation in the population, perhaps replacing one or both of the associated parent chromosomes 105 and 110.
After repeated iteration of this evolutionary process, the general fitness of chromosomes in the population improves toward the optimal solution. Thus, a solution to the problem emerges in the population, and is acquired with highly-fit chromosomes concentrated in the population.
In the conventional approach, a GA is emulated by software and the algorithm used for computing the fitness of a GA-based candidate solution to the combinatorial problem is also emulated by software. Due to such a software-based emulation on conventional computers, however, the execution speed of the algorithm for finding an optimum solution to the combinatorial problem is extremely slow.
Thus, a major drawback of conventional machines is the slow execution speed of a GA when emulated by software on conventional general-purpose computers.
A hardware-based implementation of a GA has been addressed for offsetting the drawback but only with a limited success in its execution speed. U.S. Pat. No. 5,970,487 to Shackleford, et al. solved some of the drawbacks and disadvantages of prior art techniques, particularly speed of operation, by the utilization of a hardware-based framework for accelerated used of genetic algorithms. The advantages and usages of the Shackleford et al. invention, Shackleford being the sole inventor in the instant application, are fully described in U.S. Pat. No. 5,970,487, which is incorporated by reference herein.
A common problem that is generally solved using a genetic algorithm is a combinatorial problem, also called a routing or ordering problem. A combinatorial problem is deemed to be a non-deterministic polynomial hard (NP-hard) problem, which is intractable to solve using brute force computations, e.g., finding solutions to such problems may take longer than the life of the universe. Indeed, such difficult problems must be solved by other paradigms, i.e., the genetic algorithm approach. A resource selection from among many resources by an applied form of a GA, minimizing the hardware architecture of a logic circuit, for example, will most efficiently solve an NP-hard combinatorial problem.
An example of a combinatorial problem is the Traveling Salesman Problem (or TSP), as is known in the art, which can be used to model many combinatorial, routing and ordering problems. The TSP seeks to find the shortest route between n cities, and while any solution which contains all n cities once and only once is valid, some solutions are better than others. A solution to the problem describes the order of travel between cities, which determines the distance of the route traveled, so the order of travel between cities having the shortest route is the best solution. It should be understood that the TSP is an NP-hard combinatorial problem with n! potential solutions and (nxe2x88x921)! unique solutions.
With reference now to FIG. 2, there is illustrated a series of examples of solutions to a Traveling Salesman Problem. In an 8-city problem, having a particular arrangement of cities, any route that includes all cities once and only once is valid. In the first solution of FIG. 2, designated by the reference numeral 210, one possible solution to the Traveling Salesman Problem is illustrated. However, it is apparent that solution 210 is not the best solution for the problem. The route depicted in solution 210 is clearly not the shortest possible route needed to cover all 8 cities. Another example, referenced by the numeral 220, depicts another possible solution to the Traveling Salesman Problem although, again, solution 220 is not the best solution. The solution illustrated by the example referenced by the numeral 230 depicts the best solution, which is readily apparent as the solution having the shortest distance and, thus, the best order.
Because of the large number of possible solutions to a Traveling Salesman Problem, e.g., a 32-city TSP has over 2.5*1035 solutions, heuristic and non-deterministic solving methods must be used to solve this type of problem. The TSP can be solved through a optimal solution-finding approach that aims at attaining an optimal solution through a screening process of candidate or trial solutions created through a GA, based upon a fitness evaluation of the candidate solutions. In this approach, more-fit candidate solutions are selected with less-fit candidate solutions screened out to concentrate highly-fit solutions or chromosomes and in the end to reach an optimal or near optimal solution.
The Shackleford et al. invention achieves significant increase in execution speed in its hardware implementation. The hardware implementation of a GA machine, such as that set forth in Shackleford et al., requires fast hardware-based implementations of the various steps of a GA machine, the parent selection step, the crossover step, the mutation step, the evaluation step, and the survival step.
However, the Shackleford et al. invention, although configured to solve a great many difficult problems in an expeditious manner, is not optimized to solve a combinatorial problem of the type modeled by the TSP. In particular, the crossover step does not optimally combine two parent chromosomes consistent with the TSP. In the implementation described in the Shackleford et al. invention, each bit of every chromosome is information, and crossover consists of creating a child chromosome C by taking information directly from one parent chromosome P1 until a cutpoint is reached, then taking information from another parent chromosome P2 until another cutpoint is reached, and so on. The Shackleford et al. invention utilizes this form of crossover, which is valid in problems such as the set covering problem and the protein folding problem, as is known in the art.
A different implementation of crossover, however, is required when every part of every chromosome is unique information. When every part of each parent chromosome is unique information, for example in the TSP, a more complicated implementation is required. Crossover in this case consists of creating a child chromosome C from the first parent chromosome P1 until a cutpoint is reached, then further creating the child chromosome C from the second parent chromosome P2 where all unique information is passed on, and no information is repeated in the child chromosome.
With reference now to FIG. 3, there is illustrated an example of crossover as described hereinabove in relation to the Traveling Salesman Problem, generally designated by the reference numeral 300. As shown in FIG. 3, information from the first parent chromosome 310 is taken without modification to create the first part of the child chromosome 330. Ordering information from the second parent chromosome 320 is taken in order left-to-right, in a manner so as to complete the child chromosome 330 with no loss or duplication of information. A cutpoint, designated by the numeral 340, is shown to divide the parent chromosomes 310 and 320 into two parts. It should be apparent from this example that the first parent chromosome 310 in this crossover is dominant to the second parent chromosome 320 in that the ordering information of the first parent chromosome 310 is retained entirely in the child chromosome 330, while some modification of the ordering information of the second parent chromosome 320 may be necessary before the information is used in the child chromosome 330. It should also be apparent that the parent chromosomes 310 and 320, as well as the child chromosome 330, correspond directly to the series of examples of Traveling Salesman Problem solutions 210, 220, and 230 depicted in FIG. 2.
With reference to the TSP as described hereinabove, then, valid solutions contain every city, and solutions containing duplications of cities or solutions missing cities are invalid. Therefore, child chromosomes created by combining two different parent chromosomes must contain one and only one value corresponding to each city.
Another illustration of this type of crossover deals with two randomly shuffled decks of cards. To create a third deck that retains ordering information of the two original decks, part of one deck can be taken and used to directly create the third deck. However, when taking a part of the second deck, it is necessary to first check the first part of the second deck for information not included in the first part of the first deck, and include it first. Then, there will be no loss of data. Also, once that information has been taken from the second deck and added to the third, information in the next part of the second deck will be added to the third deck, after it has been checked for duplications. In this way, all information is retained, including order, with no duplications, when the two decks are combined to create a third deck.
For use in a GA machine, the crossover step must be implemented quickly and accurately, combining the parent chromosomes with no loss of data and in a minimum amount of time.
There is, therefore, a present need to design a fast hardware-based implementation of a crossover function, that retains the order of the parent chromosomes with no loss or distortion of data, which is required for rapid evolution of solutions through a GA. What is needed is, accordingly, an invention that performs a crossover algorithm.
The present invention is directed to an iterative array of identical cells to implement a crossover function in a genetic algorithm. Each function cell receives two input values and two select values that determine which input value is outputted. Through creation of an array of these cells, two sets of information of any size can be rapidly and accurately merged to form one set composed of elements of both sets, according to precise guidelines. These guidelines are that no data is to be repeated and no data is to be lost, while retaining the order of the parent chromosomes used in crossover. In addition to the general usefulness of speed from hardware implementation, the system and methodology are particularly useful on a genetic algorithm machine.
Further scope of applicability of the present invention will become apparent from the detailed description given hereinafter. However, it should be understood that the detailed description and specific examples, while indicating preferred embodiments of the invention, are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art from this detailed description.