1. Field of the Invention
The present invention relates to genetic algorithms, and particularly to a computerized method of generating precedence-preserving crossover and mutations operations for genetic algorithms used for optimization and search problems in computer science.
2. Description of the Related Art
A genetic algorithm (GA) is a search technique used in computing to find exact or approximate solutions to optimization and search problems. Genetic algorithms are categorized as global search heuristics. Genetic algorithms are a particular class of evolutionary algorithms (EA) that use techniques inspired by evolutionary biology, such as inheritance, mutation, selection, and crossover.
Genetic algorithms are implemented in a computer simulation in which a population of abstract representations (called chromosomes or the genotype of the genome) of candidate solutions (called individuals, creatures, or phenotypes) to an optimization problem evolves toward better solutions. Traditionally, solutions are represented in binary as strings of 0s and 1s, but other encodings are also possible. The evolution usually starts from a population of randomly generated individuals and happens in generations. In each generation, the fitness of every individual in the population is evaluated, multiple individuals are stochastically selected from the current population (based on their fitness), and modified (recombined and possibly randomly mutated) to form a new population. The new population is then used in the next iteration of the algorithm. Commonly, the algorithm terminates when either a maximum number of generations has been produced, or a satisfactory fitness level has been reached for the population. If the algorithm has terminated due to a maximum number of generations, a satisfactory solution may or may not have been reached.
Genetic algorithms find application in bioinformatics, phylogenetics, computational science, engineering, economics, chemistry, manufacturing, mathematics, physics and other fields. A typical genetic algorithm requires a genetic representation of the solution domain and a fitness function to evaluate the solution domain.
A standard representation of the solution is as an array of bits. Arrays of other types and structures can be used in essentially the same way. The main property that makes these genetic representations convenient is that their parts are easily aligned due to their fixed size, which facilitates simple crossover operations. Variable length representations may also be used, but crossover implementation is more complex in this case. Tree-like representations are explored in genetic programming and graph-form representations are explored in evolutionary programming.
In genetic algorithms, crossover is a genetic operator used to vary the programming of a chromosome or chromosomes from one generation to the next. It is analogous to reproduction and biological crossover, upon which genetic algorithms are based. In a “one-point” crossover, a single crossover point on both parents' organism strings is selected. All data beyond that point in either organism string is swapped between the two parent organisms. The resulting organisms are the children.
Two-point crossover calls for two points to be selected on the parent organism strings. Everything between the two points is swapped between the parent organisms, rendering two child organisms. Another crossover variant, the “cut and splice” approach, results in a change in length of the children strings. The reason for this difference is that each parent string has a separate choice of crossover point. In the above schemes, the two parents are combined to produce two new offspring.
In the “uniform crossover” scheme (UX), individual bits in the string are compared between two parents. The bits are swapped with a fixed probability, typically 0.5. In the half uniform crossover scheme (HUX), exactly half of the non-matching bits are swapped. Thus, first, the Hamming distance (i.e., the number of differing bits) is calculated. This number is divided by two. The resulting number is how many of the bits that do not match between the two parents will be swapped.
Depending on how the chromosome represents the solution, a direct swap may not be possible. One such case is when the chromosome is an ordered list, such as an ordered list of the cities to be travelled for the traveling salesman problem. A crossover point is selected on the parents. Since the chromosome is an ordered list, a direct swap would introduce duplicates and remove necessary candidates from the list. Instead, the chromosome up to the crossover point is retained for each parent. The information after the crossover point is ordered as it is ordered in the other parent. For example, if our two parents are ABCDEFGHI and IGAHFDBEC and our crossover point is after the fourth character, then the resulting children would be ABCDIGHFE and IGAHBCDEF.
The “fitness function” is defined over the genetic representation and measures the quality of the represented solution. The fitness function is always problem dependent. For example, in the knapsack problem, one wants to maximize the total value of objects that can be put in a knapsack of some fixed capacity. A representation of a solution might be an array of bits, where each bit represents a different object, and the value of the bit (0 or 1) represents whether or not the object is in the knapsack. Not every such representation is valid, as the size of objects may exceed the capacity of the knapsack. The fitness of the solution is the sum of values of all objects in the knapsack if the representation is valid, or 0 otherwise. In some problems, it is hard or even impossible to define the fitness expression; in these cases, interactive genetic algorithms are used.
Once the genetic representation and the fitness function are defined, GA proceeds to initialize a population of solutions randomly, then improve it through repetitive application of mutation, crossover, inversion and selection operators. Initially, many individual solutions are randomly generated to form an initial population. The population size depends on the nature of the problem, but typically contains several hundreds or thousands of possible solutions. Traditionally, the population is generated randomly, covering the entire range of possible solutions (i.e., the “search space”). Occasionally, the solutions may be “seeded” in areas where optimal solutions are likely to be found.
During each successive generation, a proportion of the existing population is selected to breed a new generation. Individual solutions are selected through a fitness-based process, where fitter solutions (as measured by a fitness function) are typically more likely to be selected. Certain selection methods rate the fitness of each solution and preferentially select the best solutions. Other methods rate only a random sample of the population, as this process may be very time-consuming.
Most functions are stochastic and designed so that a small proportion of less fit solutions are selected. This helps keep the diversity of the population large, preventing premature convergence on poor solutions. Popular and well-studied selection methods include roulette wheel selection and tournament selection.
The next step is to generate a second generation population of solutions from those selected through genetic operators: crossover (also called recombination), and/or mutation. For each new solution to be produced, a pair of “parent” solutions is selected for breeding from the pool selected previously. By producing a “child” solution using the above methods of crossover and mutation, a new solution is created which typically shares many of the characteristics of its “parents”. New parents are selected for each child, and the process continues until a new population of solutions of appropriate size is generated. Although reproduction methods that are based on the use of two parents are more “biology inspired”, recent research suggests more than two “parents” are better to be used to reproduce a good quality chromosome.
These processes ultimately result in the next generation population of chromosomes that is different from the initial generation. Generally, the average fitness will have increased by this procedure for the population, since only the best organisms from the first generation are selected for breeding, along with a small proportion of less fit solutions, for the reasons noted above.
This generational process is repeated until a termination condition has been reached. Common terminating conditions include: a solution is found that satisfies minimum criteria; a fixed number of generations are reached; an allocated budget (computation time/money) is reached; the highest ranking solution's fitness is reaching or has reached a plateau such that successive iterations no longer produce better results; manual inspection or combinations of the above.
The process generally follows the steps of: Choose the initial population of individuals; evaluate the fitness of each individual in that population; repeat on this generation until termination; select the best-fit individuals for reproduction; breed new individuals through crossover and mutation operations to give birth to offspring; evaluate the individual fitness of new individuals; and replace least-fit population with new individuals.
In genetic algorithms, mutation is a genetic operator used to maintain genetic diversity from one generation of a population of chromosomes to the next. It is analogous to biological mutation. An example of a mutation operator is a probability that an arbitrary bit in a genetic sequence will be changed from its original state. A common method of implementing the mutation operator involves generating a random variable for each bit in a sequence. This random variable tells whether or not a particular bit will be modified.
The purpose of mutation in GAs is to allow the algorithm to avoid local minima by preventing the population of chromosomes from becoming too similar to each other, thus slowing or even stopping evolution. This reasoning also explains the fact that most GA systems avoid only taking the fittest of the population in generating the next, but rather a random (or semi-random) selection with a weighting toward those that are fitter.
As noted above, in GAs, potential solutions to a problem are represented as a population of chromosomes. Each chromosome in turn is composed of a string of values each is being referred to as a gene. The chromosomes evolve through successive generations. In order to exploit and explore potential solutions, offspring chromosomes are created by merging two parent chromosomes using a crossover operator or modifying an existing chromosome using a mutation operator. There are many methods of crossover and mutation operators.
One method of crossover operators is the one-cut-point method, which randomly selects one cut-point at parent chromosomes and exchanges the genes at one side of the cut points of the two parent chromosomes to generate two offspring chromosomes. On the other hand, one method of mutation operator is the uniform mutation method, which alters one or more genes in the chromosome within a specified range, according to a predefined mutation rate. During each generation, the chromosomes are evaluated on their performance with respect to the fitness functions (objective functions). Chromosomes of high fitness have higher survival probabilities. After several generations, chromosomes in the new generation may be closely identical, or certain termination conditions are met. The final chromosomes hopefully represent the optimal or near-optimal solutions to a problem.
The method of gene coding in a chromosome hinges upon the particular problem at hand. The typical time/cost trade-off problem can be formulated as a numerical optimization problem in the GAs. In this particular problem, the values of the genes in a chromosome represent possible durations of the project activities. The one-cut-point crossover and uniform mutation operators can be used efficiently for the time/cost trade-off problems. Resource allocation problems represent a typical ordering problem, as the main concern is to determine the activities' priority to fulfill the constrained resources. Accordingly, the genes represent activities' identifications and a chromosome represents a possible order of activities. A chromosome structure can be such that an activity in a higher order, from left to right, has a higher priority of getting resources than the previous activities. However, there is a possibility that character duplication and/or omission occurs after implementing the crossover and mutation operators.
Likewise, the unlimited resource leveling problem can be translated into a normal numerical optimization problem using GA techniques. Resource leveling problems represent a typical scheduling problem with the objective of minimizing the fluctuation in resource usage. The genes represent activities' start times and a chromosome represents a possible project schedule. In contrast with ordering problems, scheduling problems features specific precedence relationships among genes. Accordingly, the implementation of the one-cut-point crossover and uniform mutation operators for the leveling problem may cause violation of the precedence relationships of the offspring chromosomes. This problem entails checking the output chromosomes of the crossover and mutation operators and repairing of the infeasible chromosomes. This check/repair process causes considerable computational inefficiency to the GA technique.
It would be desirable to generate and employ precedence-preserving crossover and mutation operators for chromosomes encoding activities' start times to avoid the inefficiency caused by the basic GAs technique, through detecting and repairing the infeasible chromosomes each time these operators are performed.
A crucial challenge for construction contractors to run a sustained business is the ability to timely procure adequate money to execute construction operations. Besides owners' payments, contractors often procure an additional component of funding from external sources, including bank credit lines to supplement owners' payments. Given the facts that the owners actually pay after the accomplishment of the work, while retaining some amount of money, and the cash that contractors are allowed to withdraw from credit-line accounts is limited in amount, contractors often operate under cash-constrained conditions. Accordingly, the premise that the best proactive operating strategy contractors can follow for effective financial planning is to schedule the construction activities based on the cash availability is strongly advocated.
Typically, an additional cost component for financing is associated with cash procurement through the banks' credit lines. Contractors normally deposit owners' progress payments into the credit-line accounts to continually reduce the outstanding debit and, consequently, the financing costs. As the cash flow in FIG. 3 indicates, contractors charge the expenses caused by labor, equipment, materials, subcontractors, and other indirect costs (Et) against, and deposit progress payments (Pt) into the credit-line accounts. It can be reasonably assumed in practice that these transactions occur as of the cut-off times between periods.
Accordingly, the values of the outstanding debt F as of the cut-off times are determined. The financing costs Î, as of the cut-off times are determined by applying the prescribed interest rate to the outstanding debt. The summations of the values of the outstanding debt and the accumulated financing costs constitute the negative cumulative balance {circumflex over (F)}t. The cumulative net balance values {circumflex over (N)}t, constitute the negative cumulative balances after depositing the progress payments. The cumulative net balance of all Et, Pt, and Ît transactions constitutes the profit G as of the end of the project.
Another concern of financing, though more important than the incorporation of financing costs, constitutes the credit-limit constraints imposed on the credit lines. The credit limit specifies the maximum value the negative cumulative balance is allowed to reach as of any cut-off time. Thus, finance-based scheduling incorporates financing costs into the project total cost as well as schedules activities' such that the contractor's negative cumulative balance as of any cut-off time never top the specified credit limit. The optimization techniques employed to devise finance-based schedules normally fulfill these two goals with the objective of maximizing the profit at the end of the project. This objective is directly conducive to the minimization of the indirect costs through minimizing the project duration, and the financing costs. In order to achieve this objective, a search technique based on artificial intelligence (i.e., the GAs technique) is used.
Implementing the GAs technique in the problem of finance-based scheduling involves the steps of: devising a schedule extension scheme; setting chromosome structure; defining the chromosome evaluation criterion; generating an initial population of chromosomes; employing an offspring generation crossover and mutation operators; and coding the procedure in a computer program.
Devising a schedule that is constrained with a specified credit limit entails extending the project duration. As such, the problem at hand necessitates devising a project extension scheme. The extension scheme for an exemplary 13-activity schedule (shown in FIG. 4) is illustrated in FIG. 5, which, in bar chart form, shows the activities' total floats extended by seven-day extension increments. The extended total floats of activities, portrayed in front of activities in FIG. 5, provide time spaces within which activities can be shifted without increasing the extended project duration of twenty days. For example, activity D in FIG. 5 can be shifted all the way to the end of its extended total float and still allows activity H, which depends on activity D, to finish before the end of its extended total float. Thus, the shift of activity D can be performed without causing further extension beyond the extended project duration.
The GAs technique, harnessed with extension schemes, devises schedules at constricted credit limits such that the cumulative balance values never top the specified credit limit, while minimizing the schedule extension. The extension schemes transform the process of seeking extended schedules that fulfill cash constraints from searching in boundless solution spaces to searching in well-defined and definite solution spaces.
The chromosome structure features a string of genes with the number of genes being the same as the number of activities in the CPM network. The gene values correspond to start times assigned to the activities. As such, each chromosome represents one possible schedule. FIG. 6 shows the chromosome structure for the initial schedule of the 13-activity extension scheme in FIG. 5.
To evaluate chromosomes, the evaluation criterion is set as the expected contractor's profit at the end of the project. Initially, the project cash flow parameters are determined, as outlined above with regard to general finance-based scheduling, to produce the negative cumulative balance values, project duration, and profit. These values are constants that are equivalent to the initial schedule with no constraints imposed on the credit limit. When the GAs is applied to devise a schedule at a specific credit limit and a gene is being evaluated, the start time values of that gene are assigned to the corresponding project activities to produce a new schedule with a new duration, profit, and negative cumulative balance values. Provided that the negative cumulative balance values are below the specified credit limit, the fitness of the gene associated with that schedule is then determined by the relative improvement it exhibits over the initial schedule, as indicated by the amount of profit.
The seven-day extension scheme shown in FIG. 5 is used to illustrate the generation of the chromosomes of the initial population. Activities A through M are shown (with primes indicating the activities of the generated chromosome). The steps of the algorithm are as follows: (1) Identify activities with no predecessors (i.e., activities A, B, C, D); (2) pick up the first activity in the list of the activities identified in step 1 (i.e., activity A); (3) select a random start time of the activity in step 2 such that the activity ends before the end of the adjusted total float, for example, day 3 for activity A; (4) repeat steps 2 and 3 for all of the activities identified in step 1, such as starting activity B at day 5, activity C at day 4, and activity D at day 7; (5) identify all activities that depend exclusively on either all or some of the activities identified in step 1 (i.e., activities E, F, G, and H); (6) pick up the first activity in the list of the activities identified in step 5 (i.e., activity E); (7) select a random start for the activity in step 6 allowing all preceding activities to finish, such as day 6 being the selected start time of activity E; (8) repeat steps 6 and 7 to randomly select start times for all activities identified in step 5, such as day 9 for activity F, day 6 for activity G, and day 10 for activity H; (9) repeat steps 5 through 8 for the activities last scheduled until all the project activities are scheduled (i.e., activities I and j are scheduled first with activity I starting at day 9 and activity j starting at day 13; and (10) activities K, L, and M are scheduled with activity K starting at day 13, activity L starting at day 12, and activity M starting at day 17.
Thus, a method of generating precedence-preserving crossover and mutations operations for genetic algorithms solving the aforementioned problems is desired.