This invention relates to optimization techniques for finding the best solution to a problem of the kind that has a number of possible solutions.
In one classic problem of this kind (see FIG. 1), called the traveling salesman problem (TSP), the goal is to find the shortest route or tour (based on some criterion such as total distance traveled or time spent) that passes once through every city (A, . . . , I) in a set of cities having predefined locations. When the number of cities in the set is large, the number of possible routes 10 (where each route is an ordered sequence of paths--e.g., 12, 14, . . . --running from city to city) is extremely large. The TSP is one of a class of problems called combinatorial optimization problems because the goal is to optimize some combination of elements (e.g., the ordering of the paths that make up a travel route in order to discover the shortest tour). The number of possible tours grows combinatorially, or exponentially, with the number of cities. Thus, as the number of cities increases, it quickly becomes impossible to search exhaustively all possible tours and consequently more selective search strategies must be used.
In another class of optimization problems, called function optimization problems, the objective is to find the best solution, i.e., the multivariate solution yielding either the minimum or maximum value of a function, f: R.sup.n .fwdarw.R. For complex functions, these problems are not susceptible to solution by conventional mathematical approaches. In FIG. 2, for example, the goal may be to find the minimum value 16, and solution P.sub.* corresponding to that value, of some function 18, y=f(P), of the single variable P. We restrict our subsequent discussions, without loss of generality, to minimizations, i.e., optimizations for which the solution at the global minimum is sought.
One group of approaches to solving function optimization problems, called the homotopy methods, involve iterative techniques that begin with some trial solution and move (based on information about the slope or derivative of the function in the locality of that trial solution) through a succession of subsequent trial solutions such that the evaluation at each trial solution is smaller (in the case of function minimization) than the previous solution. Thus, at each iteration, the local homotopy methods impose a constraint on the maximum value of the function that may be considered in the next step.
In these so-called local homotopy methods, the iterative process may only move "downhill" (i.e., continually seek smaller values of the function), so that when the process reaches a minimum value, it is likely to be only a local minimum and thus not the desired global optimum. In FIG. 2, for example, if the initial trial solution is at point 20, a local homotopy method would proceed in the direction of arrow 22 until it reached point 24, a local minimum, and stop without ever reaching the "global" minimum value at point 16. In an attempt to avoid becoming "stuck" in a local optimum, the homotopy method may be repeated at many different and randomly chosen starting trial solutions. The trajectories (sequences of trial solutions) produced by the homotopy method in the localities of these initial solutions are followed downhill until local optima are found; the best of these local optima is chosen as the estimate of the global optimum. To be confident that these local homotopy methods have discovered the global solution, the search space must be carefully sampled which typically requires a great number of initial trials.
Global homotopy methods have also been developed that permit the iterative solution to proceed "uphill" at times in an attempt to avoid getting stuck at a local minimum. In this way, some types of local optima may be avoided, but there is no guarantee that the global solution will necessarily be found. Consequently, it is well-known that global homotopy methods, like local ones, require a large number of initial trials to assure that the global solution will be found.
Thus, while homotopy methods are especially effective at reaching the optimum solution quickly if they begin at a trial solution that is near to the optimum, most homotopy methods are subject to failure if they begin outside the "basin of attraction" (the region in which downhill motion will lead to a local optimum) of the optimum solution. And, of course, homotopy methods require derivative information and thus cannot solve problems for which "uphill" and "downhill" directions have no meaning. This lack of derivative information is typical of combinatorial optimization problems and, in particular, the TSP example which we employ later to illustrate the invention, lacks this information and hence cannot be solved by homotopy methods.
Another category of problem solving techniques, applicable both to combinatorial and function optimization problems, is the Standard Genetic Algorithm Optimizer (SGAO). SGAOs solve problems by providing a representation for the possible trial solutions and then proceeding through successive iterations to reach an optimal solution.
In a SGAO, the parameters of the function being optimized are represented by a population of so-called chromosomes. Each chromosome may be, for example, a string of bits (0s and 1s in the memory of a computer) with all chromosomes in the population having the same number of bits. Returning to the simple example of FIG. 2, and referring also to FIG. 3, where only one parameter P is to be optimized, the representation scheme may require that each chromosome be a three-bit binary number whose value points to the index of one of the eight possible discrete values of P, e.g., the three-bit number 010 would point to P.sub.2. The number of possible discrete values of P is governed by the number of bits per chromosome. Note that, while we primarily discuss in this section a single parameter function optimization for simplicity sake, all of these techniques are applicable to multivariate function optimizations. In fact, many of the advantages of the invention are actually amplified in the multidimensional cases since then the search space sizes increase roughly to the power of the number of variables in the problems. This makes the spaces much more difficult to search by the conventional algorithms. In the case of multiple parameters, a chromosome then becomes a string of genes, where each gene is a string of bits representing one of the parameters. All of the parameters are thereby represented as genes of bit strings along the chromosome. In other words, the structure of the chromosome is unchanged (it is still a string of bits) but its bits are divided into genes representing the various parameters of the problem. This allows the same operators and measurements to be applied to either whole chromosomes or separate genes without significantly altering them. Later, when we discuss gene measurements, we simply mean the same measurements as defined for the chromosome but applied to individual genes along the chromosome.
In each iteration of the process, a SGAO explores the "chromosome space" by manipulating the values of at least some of the chromosomes (unlike the local homotopy methods which explore the "solution space" directly). The ultimate goal is to have all of the chromosomes of the population converge to the same value, namely the one corresponding to the P.sub.i at which the function is at a minimum (P.sub.2 in FIG. 2). Note that P.sub.2 is not the true global solution P.sub.* ; P.sub.* "falls between the cracks" of the discrete trial P.sub.0 -P.sub.7 which often leads to difficulties for these algorithms.
There are two principal ways of manipulating the chromosomes during a given iteration (called a generation). One way, mutation, switches the values of bits in each one of a number of randomly selected bits from all chromosomes of the total population of chromosomes. In the other way, crossover, certain chromosomes are selected from the population for mating (to be explained below) with other selected chromosomes. Whether a given chromosome is selected for mating depends on the corresponding value of the function being optimized. For example, in FIG. 3, the first chromosome of the population may not be chosen for mating because it corresponds to a very high (i.e., poor) value of the function being optimized, while P.sub.3 would likely be chosen for mating because it has a good (i.e., low) value.
We now define and discuss the crossover operator that produces the mating results from two selected chromosome parents. It is the crossover operator that leads to the extraordinarily quick discovery of early approximate solutions (typically much faster than other algorithms including homotopy methods, Monte Carlo, and simulated annealing techniques). It is also this operator that is primarily responsible for the very slow late refinement of approximate solutions--a disadvantage that, as we shall see, the invention eliminates.
Referring to FIG. 4, in one possible example of mating, P.sub.3 and P.sub.4 are crossed over by combining, in one offspring 24, the highest-order bit of P.sub.3 with the lower order two bits of P.sub.4, and combining in a second offspring 26, the highest-order bit of P.sub.4 with the lower-order two bits of P.sub.3. Of the two offspring, P.sub.0 is the better; subsequently P.sub.4 may not be selected for crossover and may eventually be eliminated, while P.sub.0 may be selected for crossover and thus continue to contribute its chromosome bits to later generations. Note that the highest-order bit 0 of this retained chromosome is the "correct" highest-order bit of the optimum solution P.sub.2 ; crossover has the effect, in early iterations of the process, of propagating to later generations the highest-order bit of the optimum solution. In this manner, the crossover operation "finds" the higher-order bits early on and "remembers" them by storing them in the higher-order bit positions of the chromosomes.
This need to "remember" the higher-order bits while continuing to search for the lower-order bit values leads to an inefficient search for those bits and ultimate breakdown of SGAOs treating complex problems. As we shall see, the invention, on the other hand, eliminates this need for the chromosomes to "remember" these bit values by simply extracting, when appropriate, this information from the chromosomes, conceptualizing it, and storing it within an adaptable translational mapping (to be described). This frees the chromosomes in conjunction with crossover to perform at maximum efficiency during their entire search, alleviating this disadvantage inherent in SGAOs.
Returning to the example, the searching performed by crossover is a binary search in that the highest-order bit of the chromosomes corresponds to a bifurcation of the function in FIG. 2 at the line 28; the next to the highest-order bit corresponds to bifurcations at the lines 30, 32 (and also at line 28); and the lowest order bit to bifurcations at the lines 34, 36, 38, 40 (and also at lines 28, 30, 32).
The search space represented by the chromosomes is multidimensional, e.g., three-dimensional in our case of three-bit chromosomes. One may define so-called hyperplanes within that multi-dimensional space such that, e.g., all of the chromosomes having a `1` as the highest-order bit lie on a first-order hyperplane (literally a two-dimensional plane in the example) while all of the chromosomes having a `1` as the highest order bit and a `1` as the lowest-order bit would lie on a second-order hyperplane (in this case a line). Thus it may be said that the mutation and crossover operations in successive iterations in effect endeavor to find the hyperplanes of the chromosome space that combine to form the representation of the optimal solution of the solution space. In other words, the hyperplanes are the building-blocks of the solution which the operators of the SGAO attempt to discover.
Even near the end of the search, as the chromosome values are converging toward the solution, the SGAO continues to search the entire range of values of P.sub.i. This is the case simply because mutation and crossover can produce offspring that have any arbitrary bit values and therefore all possible offspring span the entire search space.
One effect stemming from the representational scheme by which the chromosome values represent the indices of Pi is the so-called "Hamming cliff" effect. As an example, suppose the minimum value of a function occurs at the fourth parameter value (binary 011), but the SGAO has found its way to the fifth parameter value (binary 100 ) which is near to the optimum in terms of the functional value but far away in terms of Hamming distance (the Hamming distance between two chromosomes is the number of non-identical corresponding bit positions. e.g., 011 and 100 have the maximum Hamming distance of 3 while 011 and 001 have a Hamming distance of 1). For a SGAO to move from the fifth parameter to the correct solution at the fourth value would require either three specific single mutations or a particular crossover and a simultaneous mutation. Either sequence of operations is extremely unlikely because the mutation and crossover operators are triggered probabilistically. Attempts to avoid Hamming cliffs by using a Gray coding scheme in place of the binary code render the crossover operator far less efficient in searching the chrosome space for possible solutions. Note that, in a SGAO, the representational scheme by which the value of a given chromosome is linked to a corresponding parameter value (e.g., through the binary code or through a Gray code) does not change from iteration to iteration during execution.
It is characteristic of a SGAO that it, in effect, searches the solution space (i.e., the range of possible solutions--values of the parameter upon which the function y depends in the case of FIG. 2) relatively quickly during early generations. So quickly, in fact, that SGAOs often become stuck in local optima. This effect, termed premature convergence, occurs because one individual chromosome from the randomly chosen initial population of chromosomes will almost always be "better" than the rest. The progeny of this super-individual quickly take over the population, driving out the offspring from the other, poorer, chromosomes from the initial population. The SGAO has become stuck at the solution represented by the super-individual, even though the super-individual does not represent the globally optimal solution. After premature convergence occurs, the SGAO is incapable of efficient further search of the solution space. In the case of FIG. 2, for example, a SGAO would likely converge on P.sub.3 as the solution even though the optimum value of the function y is at point 16 and P.sub.2 is the closest trial solution.
Because a SGAO is more efficient in earlier iterations, when the trial solution may not be near the optimum solution, while homotopy methods are more effective when the trial solution is close to the optimum, it has been proposed to switch from a SGAO to a homotopy method at some point in the process of solving a problem. This strategy, however, has difficulties because if the switch is made too early, the homotopy method will drive the trajectory of trial solutions to a local optimum, while if the switch is made too late, the increased efficiency of the homotopy method is lost.
In summary, referring to FIG. 5, in a SGAO 35, evaluations 29 of individual trial solutions 31 produce corresponding payoffs 30 that are used by the SGAO to control genetic alogorithm (GA) operators 34 (selection, crossover, and mutation) which determine the next generation of chromosomes in the chromosome population 36. The new generation of chromosomes are converted by a fixed translation 38 into new trial solutions 32 for evaluation. The process repeats in the next generation. The structure of the chromosome space and the translation together make up a representational scheme 40 that is predefined and does not change from generation to generation. The SGAO is thus solely an evolutionary (i.e., Darwinian--random mutation with survival and mating of the fittest) technique in that information is passed only in the direction from the chromosome population to the trial solution population (and to the GA operators as payoffs of individual trial solutions). Although representations of the solution are stored in the bits of the chromosome population, the SGAO never alters the representational scheme.
Referring again to FIG. 2, in the representational scheme for function y, each chromosome in the population always has three bits, and the translation between the eight possible values of the chromosome (illustrated by the markers labeled P.sub.0, . . . , P.sub.7) is always fixed. That is, a chromosome whose value is 000 is always translated to the parameter value P.sub.0 (and in turn to corresponding functional value y.sub.0).
It is well known that the method of representing the trial solutions of the space of all possible solutions is most important to any particular algorithm's success in searching that space. For complex problems, there is usually no known best representation. But even beyond not knowing the best representation, we have discovered, and the invention takes advantage of the fact, that for interative improvement algorithms the best representation changes as the trial solutions are discovered or refined. Whenever the user chooses a representation to employ with a traditional search algorithm, that choice has associated search biases that affect the performance and accuracy of the method and may even lead to its failure. There are numerous well-known (and even named) albeit subtle, problems stemming from the representational issues; several of these unfavorable characteristics are discussed below.