The natural selection process provides a powerful tool for problem solving. This is shown by nature and its various examples of biological entities that survive and evolve in various environments. In nature, complex combinations of traits give particular biological population the ability to adapt, survive, and reproduce in their environments. Equally impressive is the complex, relatively rapid and robust adaptation and relatively good interim performance that occurs amongst the population of individuals in nature in response to changes in the environment. Nature's methods for adapting biological populations to their environment and nature's method for adapting these populations to successive changes in their environments (including survival and reproduction of the fittest) provides a useful model. This model can be used to develop methods to solve a wide variety of complex problems that are generally thought to require "intelligence" to solve.
In nature, a gene is the basic functional unit by which hereditary information is passed from parents to offspring. Genes appear at particular places (called gene loci) along the molecules of DNA (deoxyribonucleic acid). DNA is a long threadlike biological molecule that has the ability to carry hereditary information and the ability to serve as a model for the production of replicas of itself. All known life forms on this planet including bacteria, fungi, plants, animals, and humans are based on the DNA molecule.
Genetic coding of the DNA molecule consists of long strings (sequences) of four possible gene values that can appear at the various gene loci along the DNA molecule. For DNA, the four possible gene values refer to four bases named adenine, guanine, cytosine, and thymine (usually abbreviated as A, G, C, and T respectively). Thus, the genetic code in DNA consists of long strings such as CTCGACGGTCTC.
A chromosome consists of numerous gene loci with a specific value (called an "allele") at each gene locus. The chromosome set for a human being consists of 23 pairs of chromosomes. The chromosomes together provide the information and instructions necessary to construct and describe one individual human being and contains about 3,000,000,000 genes. These 3 billion genes constitute the so called "genome" for one particular human being. Complete genome of the approximate five billion living human beings together constitute the entire pool of genetic information for the human species. It is known that certain gene values occurring at certain places in certain chromosomes, control certain traits of the individual, including traits such as eye color, susceptibility to a particular disease, etc.
Organisms created from the DNA information spend their lives attempting to deal with their environment. Some organisms do better than others in grappling with or opposing their environment. In particular, some organisms survive to the age of reproduction and therefore pass on their genetic makeup to their offspring. In nature, the process of Darwinian natural selection causes organisms with traits that facilitate survival to the age of reproduction to pass on all or part of their genetic make-up to offspring. Over a period of time and many generations the population as a whole evolves so that the chromosome strings in the individuals in the surviving population perpetuate traits that contribute to survival of the organism in its environment.
A genetic algorithm is a model of machine learning which derives its behavior from a metaphor of the process of evolution previously described. This is done by the creation, within a machine, of a population of individuals represented by chromosomes, in essence a set of character strings that are analogous to the base four chromosomes of the DNA molecule. The individuals in the population then go through a process known as evolution.
It should be noted that evolution (in nature or anywhere else) is not a purpose or directed process. That is, there is no evidence to support the assertion that the goal of evolution is to produce mankind. Indeed, the process of nature seems to boil down to different individuals competing for resources in the environment. Some are better than others; those that are better are more likely to survive and propagate their genetic material.
In nature, the encoding of genetic information (genome) is done in a way that admits asexual reproduction (such as budding) which results in offspring that are genetically identical to the parent. Sexual reproduction allows the creation of genetically different offspring that are still of the same species. Genetic information may also be re-arranged by a process known as recombination. In its most simplified form, recombination can be described as two chromosomes exchanging pieces of genetic information with each other. A recombination operation may also be referred to as crossover because of the way that genetic material crosses over from one chromosome to another.
The selection of who gets to mate is a function of the fitness of the individual at competing for resources in its environment. Some genetic algorithms use a simple function of the fitness measure (probablistically) to select individuals for further operations such as crossover. Other implementations use a model in which certain randomly selected individuals in a sub-group compete and the fittest is selected. This is called tournament selection and is the form of selection used in nature. The two processes that most contribute to evolution are crossover and fitness based selection.
Mutation also plays a role in this process, though it is not the dominate role. Mutation occurs when genetic material is randomly altered.
Genetic algorithms are used for a number of different application areas. An example of this includes multi-dimensional mulit-modal optimization problems in which the character string of the chromosome can be used to encode the values of the different parameters being optimized. Such is the case in the present invention.
In practice the genetic model of computation is implemented by having arrays of bits or characters to represent the chromosomes. Simple bit manipulation operations allow the implementation of crossover, mutation, and other operations. Although a substantial amount of research has been performed on variable length strings and other structures, the majority of work with genetic algorithms is focused on fixed length character strings.
When the genetic algorithm is implemented it is usually done in a manner that involves the following cycle: evaluate the fitness of all the individuals in the population; create a new population by performing operations such as fitness proportionate selection, crossover and mutation on individuals whose fitness has just been measured; discard the old population; and iterate using the new population.
One iteration of this loop is referred to as a generation. There is no theoretical reason for this as an implementation model. Indeed, we do not see this punctuated behavior in populations in nature as a whole, but it is a convenient implementation model.
The first generation of this process operates on a population of randomly generated individuals. From there on the genetic operation, in concert with the fitness measure, operates to improve the population.
Genetic algorithms are highly parallel algorithms that transform populations of individual mathematical objects (typically fixed length binary character strings) into new populations using operations patterned after 1) natural genetic operation such as sexual recombination or crossover and 2) fitness proportion selection (Darwinian survival of the fittest). Genetic algorithms begin with an initial population of individuals as stated above and then iteratively evaluate the individuals in the population for fitness with respect to the problem environment and perform genetic operations on various individuals in the population to produce a new population. John Holland of the University of Michigan presented the pioneering formulation of genetic algorithms or fixed length binary character strings in Adaptation in Artificial and Natural Systems, by Professor John H. Holland, 1975. Holland established, among other things, that the genetic algorithm is a mathematically near optimal approach to adaptation in that it maximizes expected overall payoff when the adaptive process is viewed as a multi-armed slot machine program requiring an optimal allocation of future trials giving currently available information. Recent work in genetic algorithms and genetic classifier systems can be found in Preceding of an International Conference on Genetic Algorithms and Their Applications, John J. Grefenstette (1985), Genetic Algorithms and Their Applications: Proceedings of the Second International Conference on Genetic Algorithms, John J. Grefenstette (1987), Genetic Algorithms In Search, Optimization, and Machine Learning, David E. Goldberg (1989), Genetic Algorithms and Simulated Annealing, Lawrence Davis (1987) and Proceedings of the Third International Conference of Genetic Algorithms, J. D. Schaffer (1989).
In Adaptation in Artificial and Natural Systems, Holland summarizes his research in genetic algorithms and presents an overall mathematical theory of adaptation for both natural and artificial systems. A key part of this book describes a genetic algorithm patterned after nature's method of biological adaptation. Additional information can be found in U.S. Pat. No. 4,697,242 to Holland and U.S. Pat. No. 4,881,178 to Holland both incorporated herein by reference.
Empirical studies by various researchers have demonstrated the capabilities of such genetic algorithms in many diverse areas, including function optimization, operation of gas pipeline and several other reviewed in Goldberg.
In the chapter entitled "An Overview" contained in the 1987 collection Genetic Algorithms and Simulated Annealing, Lawrence Davis and Martha Steenstrup stated, "In all of Holland's work, and in the work of many of his students, chromosomes are bit streams, lists of 0's and 1's." In addition they continue, "Some researchers have explored the use of other representations often in connection with industrial algorithms. Examples of other representations included ordered list (for bin packing), embedded lists (for factory scheduling problems), variable element lists (for semiconductor layout), and the representations used by Glover and Grefenstette in this volume."
Some researchers have attempted to solve search and optimization problems using schemes patterned after evolution that employ mutation-plus-save-the-best strategies. The few results obtained from these efforts are highly specific to particular application domains and largely reflect the cleverness of implementation rather than the usefulness of a general technique for achieving adaptation increases in fitness in the population. It is important to note that mutation is not the primary means by which biological populations in nature improve their fitness and it is not the primary means used in the present invention.
Since Holland's 1975 book, Holland and various colleagues have developed an application of conventional genetic algorithms called the genetic classifier system. The classifier system is a group of rules. Each rule consists of a conditioned part and an action part (i.e., and IF THEN rule). Both the conditioned part and the action part of each rule are like the individuals in the conventional algorithm in that they are strings of 0's and 1's of fixed length. In a classifier system, messages are received from the environment and invoke those rules whose conditional part match the message coming in. This indication triggers the action part of the rule. The action part of the rule sends out a new message.
Classifier systems are described in the 1978 article Cognitive Systems Based On Adaptive Algorithms. (John Holland and Judith S. Reitman) In classifier systems, credit is assigned to chains of individuals that are invoked using credit allocation scheme known as the "bucket brigade". The Holland process is a combination of a classifier system and a "bucket brigade" algorithm.
In U.S. Pat. No. 5,343,554, to John R. Koza genetic algorithms, as previously described are expanded into what is now known as genetic programming. One of the primary objectives of genetic programming is to remove the limitation of using fixed length binary strings to represent the population. Because genetic programming is not within the scope of the present invention a brief description will suffice here.
In one embodiment of Koza the apparatus and process initially creates a population of entities which are evolved to automatically encode a set of data values into a procedure or function capable of approximating those data values. Thus, by using this embodiment, data, such as video, audio, or images, can be transformed into a function whose representation is cheaper to store and transmit than is the data itself.
The function generated using Koza's invention is an approximation of the original data. For the data types enumerated above, this approximation maybe sufficient. However, for more exact data such as computer programs, computer data bases, and the like, an approximation to the original data will not suffice. It should be noted, that in some circumstances Koza's embodiment may actually reproduce the original data, however, this is not guaranteed. Simply put, Koza's approach does not exhibit the property of reciprocity and therefore is a "noisy" approach, as used in information theory.
A global computer system requires multilingual applications and platforms with a minimum of code complexity and memory requirements. However, multilingual requirements almost always equate to additional code complexity. As code complexity increases, development and maintenance cost follow. This nearly exponential cost increase may make some international projects unfeasible. Adding to the costs, complex applications generally have higher memory consumption. Thus, running certain applications becomes impossible on cheaper platforms.
One aspect of code complexity problem has been addressed by the Unicode Consortium with the development of a 2-byte character encoding standard that includes characters from all of the world's scripts as well as technical symbols in common use. These characters include scripts from countries such as Japan, China, Korea, Russia, Saudi Arabia, France, etc. Because it employs a fixed-width encoding, any Unicode Compliant application can be easily localized for different countries.
However, the problem of large memory requirements remains, especially when you consider that the Unicode standard has code space for 65,536 characters. To ensure backward compatibility, mappings between Unicode and the world's other standards must be provided. It is these mappings which result in the most overbearing memory requirements.
One such mapping relates to Shift-Jis, the most common character set standard in use in Japan. Because of Shift-Jis's popularity, it is imperative that Unicode Compliant platforms provide a Shift-Jis to Unicode mapping for backward compatibility. However, since Shift-Jis defines 7,037 characters (6,942 of which are 2-byte) spread over the range 32 to 60,068 and Unicode currently defines over 28,000 characters (all of which are 2-byte) over the range 0 to 65,534, such a mapping could require a considerable amount of memory. The problem, then is to minimize the memory required for a Shift-Jis to Unicode mapping without sacrificing access time.
One method for providing a Shift-Jis to Unicode mapping is to use a simple array, where the index into the array is the Shift-Jis code for a particular character and the value at that position is the Unicode code for that character. This method would require: EQU 60,038 codes*2 bytes=120,072 bytes
This is obviously not an optimal solution, but it clarifies the problem. To further explore the problem, consider a simple lookup table, where the first column of the table contains the Shift-Jis code for a particular character, and the second column contains the Unicode code for the character. This method would require: EQU 7,037 characters*2 bytes*2 columns=28,148 bytes
In addition, this method requires a search operation to find the required data. Using a binary search method, the worst case lookup would require log.sub.2 (7,037).congruent.12 comparisons.
A common method of reducing the size of a block of data is binary compression. Several algorithms exist but, each suffers from three significant problems:
1) When a chunk of data is decompressed, it must be stored somewhere. Therefore, some memory must be set aside for receiving uncompressed blocks; PA1 2) The algorithms can be difficult to program and the code space required may offset the savings from compressing the data; and PA1 3) Since data must be decompressed, access times are long.
Even without these problems, binary compression rarely achieves more than 50% compression. Assuming one started with the lookup table approach, one would still need over 14 kb of space to hold the compressed data.