1. The Field of the Invention
The field of the invention is computer-implemented genetic algorithms. More specifically, the field is genetic algorithms useful for problem solving. The field spans the range of problems wherein a fit composition of functions may be found as a solution to the problem.
2. The Prior Art
The Natural Selection Process in Nature
The natural selection process provides a powerful tool for problem solving. This is shown by nature and its various examples of biological entities that survive and evolve in various environments. In nature, complex combinations of traits give particular biological populations the ability to adapt, survive, and reproduce in their environments. Equally impressive is the complex, relatively rapid, and robust adaptation and relatively good interim performance that occurs amongst a population of individuals in nature in response to changes in the environment. Nature's methods for adapting biological populations to their environment and nature's method of adapting these populations to successive changes in their environments (including survival and reproduction of the fittest) provides a useful model. This model can be used to develop methods to solve a wide variety of complex problems which are generally thought to require "intelligence" to solve.
In nature, a gene is the basic functional unit by which hereditary information is passed from parents to offspring. Genes appear at particular places (called gene "loci") along molecules of deoxyribonucleic acid (DNA). DNA is a long thread-like biological molecule that has the ability to carry hereditary information and the ability to serve as a model for the production of replicas of itself. All known life forms on this planet (including bacteria, fungi, plants, animals, and humans) are based on the DNA molecule.
The so-called "genetic code" involving the DNA molecule consists of long strings (sequences) of 4 possible gene values that can appear at the various gene loci along the DNA molecule. For DNA, the 4 possible gene values refer to 4 "bases" named adenine, guanine, cytosine, and thymine (usually abbreviated as A, G, C, and T, respectively). Thus, the "genetic code" in DNA consists of a long strings such as CTCGACGGT.
A chromosome consists of numerous gene loci with a specific gene value (called an "allele") at each gene locus. The chromosome set for a human being consists of 23 pairs of chromosomes. The chromosomes together provide the information and the instructions necessary to construct and to describe one individual human being and contain about 3,000,000,000 genes. These 3,000,000,000 genes constitute the so-called "genome" for one particular human being. Complete genomes of the approximately 5,000,000,000 living human beings together constitute the entire pool of genetic information for the human species. It is known that certain gene values occurring at certain places in certain chromosomes control certain traits of the individual, including traits such as eye color, susceptibility to particular diseases, etc.
When living cells reproduce, the genetic code in DNA is read. Sub-sequences consisting of 3 DNA bases are used to specify one of 20 amino acids. Large biological protein molecules are, in turn, made up of anywhere between 50 and 500 such amino acids. Thus, this genetic code is used to specify and control the building of new living cells from amino acids.
The organisms consisting of the living cells created in this manner spend their lives attempting to deal with their environment. Some organisms do better than others in grappling with (or opposing) their environment. In particular, some organisms survive to the age of reproduction and therefore pass on their genetic make-up (chromosome string) to their offspring. In nature, the process of Darwinian natural selection causes organisms with traits that facilitate survival to the age of reproduction to pass on all or part of their genetic make-up to offspring. Over a period of time and many generations, the population as a whole evolves so that the chromosome strings in the individuals in the surviving population perpetuate traits that contribute to survival of the organism in its environment.
Prior Art Genetic Algorithms
Genetic algorithms are highly parallel algorithms that transform populations of individual mathematical objects (typically fixed-length binary character strings) into new populations using operations patterned after (1 ) natural genetic operations such as sexual recombination (crossover) and (2) fitness proportionate reproduction (Darwinian survival of the fittest). Genetic algorithms begin with an initial population of individuals (typically randomly generated) and then iteratively (1) evaluate the individuals in the population for fitness with respect to the problem environment and (2) perform genetic operations on various individuals in the population to produce a new population. John Holland of the University of Michigan presented the pioneering formulation of genetic algorithms for fixed-length binary character strings in Adaptation in Artificial and Natural Systems, by Professor John H. Holland, 1975. Holland established, among other things, that the genetic algorithm is a mathematically near optimal (minimax) approach to adaptation in that it maximizes expected overall average payoff when the adaptive process is viewed as a multi-armed slot machine problem requiring an optimal allocation of future trials given currently available information. Recent work in genetic algorithms and genetic classifier systems can be surveyed in Grefenstette ( 1985), Grefenstette (1987), Goldberg (1989), Davis (1987), and Schaffer (1989).
In Adaptation in Artificial and Natural Systems, Holland summarizes his research in genetic algorithms and presents an overall mathematical theory of adaptation for both natural and artificial systems. A key part of this book described a "genetic algorithm" patterned after nature's methods for biological adaptation. However, a limitation of this work resides in using fixed length binary strings to represent the population. U.S. Pat. No. 4,697,242 (Holland) and U.S. Pat. No. 4,881,178 (Holland) are examples of processes which use fixed length binary strings with a genetic algorithm.
Empirical studies by various researchers have demonstrated the capabilities of such genetic algorithms in many diverse areas, including function optimization (De Jong 1980), operation of a gas pipeline (Goldberg 1983), and many others reviewed in Goldberg (1989).
In the chapter entitled "An Overview" contained in the 1987 collection Genetic Algorithms and Simulated Annealing, Lawrence Davis and Martha Steenstrup stated, "In all of Holland's work, and in the work of many of his students, chromosomes are bit strings--lists of 0's and 1's." In addition, they continue, "Some researchers have explored the use of other representations, often in connection with industrial algorithms. Examples of other representations include ordered lists (for bin-packing), embedded lists (for factory scheduling problems), variable-element lists (for semiconductor layout), and the representations used by Glover and Grefenstette in this volume."
Some researchers have attempted to solve search and optimization problems using schemes patterned after evolution that employed mutation-plus-save-the-best strategies. Examples are Box (1957), Hicklin (1986), and the 1966 book by Fogel, Owens, and Walsh entitled Artificial Intelligence Through Simulated Evolution. The few results obtained from these efforts were highly specific to particular application domains and largely reflect the cleverness of implementation rather than its usefulness as a general technique for achieving adaptive increases in fitness in populations. It is important to note that mutation is not the primary means by which biological populations in nature improve their fitness and it is not the primary means used in the present invention.
Since Holland's 1975 book, Holland and various colleagues have developed a novel application of conventional genetic algorithms called a "genetic classifier system". A classifier system is a group of rules. Each rule consists of a condition part and an action part (i.e. an IF-THEN rule). Both the condition part and action part of each rule are like the individuals in the conventional genetic algorithm in that they are a strings of 0's and 1's of fixed length. In a classifier system, messages (consisting of binary strings) are received from the environment and invoke those rules whose conditional part ("IF" part) match the message (binary string) coming in. This invokation triggers the action part ("THEN" part) of the rule. The action part of a rule sends out a new message (binary string).
Classifier Systems are described in the 1978 article "Cognitive Systems based on Adaptive Algorithms" (by Holland and Judith S. Reitman) published in Pattern-Directed Inference Systems, edited by D. A. Waterman and Frederick Hayes-Roth; and David E. Goldberg's 1983 dissertation entitled Computer-Aided Gas Pipeline Operations Using Genetic Algorithms and Rule Learning. In classifier systems, credit is assigned to chains of individual rules that are invoked using a credit allocation scheme known as the "bucket brigade". The Holland process is a combination of a classifier system and a "bucket brigade algorithm". A 1987 paper by Cory Fujiki and John Dickinson in Genetic Algorithms and Their Applications: Proceedings of the Second International Conference on Genetic Algorithms, (John J. Grefenstette, 1987) describes a computer program written in LISP for solving the Prisoner's Dilemma using binary strings of fixed length and IF-THEN classifier rules. In addition, Smith (1980, 1983) has placed IF-THEN rules in genetic strings in lieu of individual characters.
We call conventional genetic algorithms "linear" because they manipulate strings (sequences) of characters over a fixed alphabet (typically strings of binary digits 0 and 1). This is in contrast to the "non-linear" situation in which the objects being manipulated are hierarchical expressions consisting of a hierarchical arrangement of functions and terminals.
The reasons for limiting the conventional genetic algorithm to binary strings of fixed length appear in the literature. First, in his 1983 dissertation entitled Computer-Aided Gas Pipeline Operation Using Genetic Algorithms and Rule Learning, David E. Goldberg argues that any binary string of the common fixed length always has an interpretation (via a well-defined representation scheme) to the problem being solved. This might be called the property of being "well defined" and it is a desirable property.
Secondly, if each individual in the population consists of a binary string of fixed length, then the crossover operation will always produce another binary string of fixed length when applied to any two individuals in the population. This might be called a "closure" property and it is also a desirable property. Of course, binary strings of fixed length are not the only way of achieving these desirable properties of closure and being well-defined.
In Adaptation in Natural and Artificial Systems (1975, page 71), Holland argues in favor of strings consisting only of 0's and 1's (i.e. binary strings) in the conventional genetic algorithm on the basis that the number of strings in the search space that are searched automatically using what he calls the "implicit parallelism" of the conventional genetic algorithm is highest when the strings consist only of two possibilities. This point is true; however, it should not be the controlling consideration. For various reasons cited hereinafter, limiting the genetic algorithm to the one dimensional world of linear strings of fixed length (and, in particular, binary strings of fixed length) precludes solving many problems. The field of computer science is replete with other situations where it is highly unrealistic to assume that the size or shape of a problem is known in advance to the solver so that he can use this information to rigidly pre-specify the size and shape of his search in advance.
Using fixed length binary strings in conventional genetic algorithms limits their ability to solve many problems. The following two separate example problems illustrate additional limitations of conventional genetic algorithms.
First, suppose we want a computer to program itself to solve the problem of finding the point at which two intersecting straight lines intersect. The point of intersection of two straight lines is the pair of numbers that satisfy the two linear equations in two variables that represent the lines. Thus, the computer program we are seeking would use the coefficients of the two equations and various mathematical operators (such as multiplication, subtraction, etc.) to produce the desired answer. To make the problem of having a computer learning to program itself more realistic, it is best not to specify in advance the size or shape of the mathematical expression needed to solve the problem. It is also more realistic if the computer had access to various irrelevant inputs and extraneous mathematical operations that might confuse its search to find the solution to the problem.
There is no simple or convenient way to uniquely associate a binary string whose length is predetermined in advance with an arbitrary mathematical expression composed of specified mathematical operations (functions) and terminals. A binary string of length n can only represent 2.sup.n different things (no matter what the representation scheme). No matter how large an n is pre-selected in advance, there are always additional mathematical expressions.
Before continuing, it should be emphasized that it is not necessary to represent things of infinite size. Rather, what should be avoided is arbitrarily pre-setting a limit on the size and shape of the things being represented (even though any particular thing will itself be finite in size). In most problems, the size and shape of the solution are not necessarily known in advance. The process of solving the problem should be free to develop proposed solutions without any pre-set limit on the size and shape of the solution.
Even if an arbitrary maximum length specified in advance were acceptable, the method for associating each arbitrary mathematical expression (for example: A*B+C-D*E*F) with a binary string would necessarily obscure the underlying mathematical operations involve. The highly complex method used by Godel in 1931 in his proof of the Incompleteness Theorem is an example of such a method for making this kind of association. Thus, this first example problem highlights the need to be able to represent arbitrary mathematical expressions (involving various functions and terminals) whose length is not arbitrarily limited in advance (rather than merely strings of 0's and 1's of the same fixed length).
Let us now consider the problem of solving a system of two linear equations and also the problem of sequence induction.
It should be noted that if it is assumed that the two straight lines in this problem always intersect, the problem is entirely numerical. However, if the two lines might possibly be parallel, the answer from a computer program to this expanded version of the problem might appropriately be a symbolic response (e.g. "The Equations are inconsistent and the lines are parallel") rather than the numeric location of the point of intersection. This situation can be easily recognized by a computer program by checking to see if a certain computed value (the determinant) is zero. Thus, this expanded version of this first example problem highlights the need occasionally to accommodate symbolic processing and symbolic output from a computer program that normally produces a numeric output.
Second, consider the problem of predicting the future elements of a sequence of numbers from a sampling of early numbers from the sequence. This problem is an example of induction. Induction is the logical process by which one observes specific examples of some process (e.g. "The sun has come up every morning so far during my life") and then "induces" a reasonable underlying rule for the process (e.g. "The sun always comes up in the morning"). In applying inductive reasoning, there is no proof that the result is correct. Nonetheless, the process of induction is very important and indeed lies at the heart of all learning.
In contrast, deduction is the logical process in which one starts with some given premises (or facts) and some deductive rules of inference and then reaches a logical conclusion by repeatedly applying the deductive rules to the original given premises or facts. The sequence of steps used in deduction to reach a conclusion is called the proof.
If one is given a sampling of a sequence of numbers such as 0, 2, 4, 6, 8, 10, 12, 14 it is not difficult to reasonably induce that the next number in the sequence is 16. The number 16 is a reasonable induction because each previous element of the sequence is 2 times the element's position in the sequence (counting the first element as position 0). Note, however, that even elements of this simple numerical sequence cannot be represented with strings whose length has been specified in advance.
More interesting sequences involve more complicated mathematical operations. For example, the 6th element of the sequence 2, 4, 8, 16, 32, can be expressed directly in mathematics as 2 raised to the 6th power (i.e. 64). This sequence can also be expressed in mathematics using a recursion--that is, by defining the 6th element in terms of previous element(s) in the sequence. In this case, the m.sup.th element of the sequence is 2 times element m-1 of the sequence (that is, 2 times 32 is 64).
For some important mathematical sequences of integers, there is no known non-recursive expression for each element of the sequence, and the use of a recursion becomes a necessity, not merely an option. The well-known Fibonacci sequence 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, is constructed by adding the 2 previous elements of the sequence. For example, 8 is the sum of 3 and 5, and 13 is the sum of 5 and 8. In general, the m.sup.th element of the Fibonacci sequence is the sum of element m-1 and element m-2 of the sequence (with the understanding that the first two elements of the sequence are a "default" value of 1).
Thus, the problem of sequence induction highlights the need to be able to represent recursions as well as arbitrary mathematical expressions (involving functions and terminals). It also re-emphasizes the need to be able to represent strings whose length has not been pre-specified in advance.
Many problems are best approached by developing hierarchies in which solutions to sub-problems are manipulated and assembled hierarchically into solutions to the original main problem. In fact, many mathematical problems are solved by first "decomposing" a larger problem into smaller sub-problems. Then, an attempt is made to solve each of the sub-problems. And, finally, the solutions to the sub-problems are assembled into a solution to the original problem. The problem of solving large numbers of equations with many variables and solving polynomial equations of high order are examples of problems where decomposition can be used. In some cases, there is a symmetry between this process of assembly and the solution to the individual sub-problems. That is, in this assembly process, the solutions to the sub-problems may be manipulated as if they themselves were merely the elements of a sub-problem.
Even when no symmetry is involved, a "hierarchy" develops when a problem is solved by decomposition. At the lowest level of the hierarchy, the sub-problem is solved. The hierarchy consists of combining the solutions of the sub-problem into the solution to the larger problem. Something similar is commonplace in computer programming in general. For example, sub-routines (or sub-procedures) are typically called by a main program. The main program is at the top of the hierarchy, typically organized to provide an overview of the solution to the whole problem. Each of the sub-routines called by the main program are found at one level lower on the hierarchy. If one of the sub-routines itself happens to call upon another sub-routine, that second sub-routine is one level lower on the hierarchy than the sub-routine which called it. Complex social organizations (such as corporations and military organizations), are similarly organized into hierarchies. The ability to decompose problems into hierarchies of sub-problems is generally important for solving problems.
It should be noted, however that the conventional genetic algorithm imposes at least five important limitations which restrict its usefulness in solving a broad range of problems.
First, the requirement that each individual in the population be a string of the same length arbitrarily limits consideration to only a pre-determined number of situations, cases, or states of the problem environment.
Secondly, the use of a binary string (a string of 0's and 1's) leads to a representation scheme involving an explosively large number of "different" solutions merely to handle consideration of only a few past populations. In contrast, if the representation scheme were not required to be rigidly structured in advance prior to the start of operation of the conventional genetic algorithm, a representation scheme involving only a relative handful of relevant possible histories might have evolved.
Thirdly, the individuals in the population are representational descriptions (codings) of a solution (as opposed to being actionable procedures which directly implement the solution). Any particular solution that one envisions and wants to include in the population must be first coded into a binary string of fixed length before it can be inserted into the population. Before any solution can be implemented, the binary string must be decoded into actionable instructions.
Fourthly, the binary strings of fixed length provide no hierarchical structure for potential solutions to the problem. The binary string is one dimensional. All items in the string operate at the same level.
Fifth, it is often true that conventional genetic algorithms are extremely efficient in searching large, complex, non-linear spaces to find an area that is especially good, but that other search techniques are better than conventional genetic algorithms in zeroing in on the final, precise, global optimum value in the search space. Thus, for some problems, it is common to use conventional genetic algorithms to quickly find the best neighborhood of the overall search space and then to switch to another search technique (such as simulated annealing or hill-climbing) to zero in on the precise global optimum value. This shortcoming of conventional genetic algorithms is, for many problems, the direct result of the fixed representation scheme selected at the beginning of the process. If the representation scheme were adaptive (i.e. not fixed), it could change its size and shape after getting into the right general neighborhood of the solution. It could then become more refined so that it would be capable of finding the precise global optimum solution to the problem.
Background on Genetic Programming Paradigm
Representation is a key issue in genetic algorithm work because genetic algorithms directly manipulate the coded representation of the problem and because the representation scheme can severely limit the window by which the system observes its world. Fixed length character strings present difficulties for some problems--particularly problems in artificial intelligence where the desired solution is hierarchical and where the size and shape of the solution is unknown in advance. The need for more powerful representations has been recognized for some time (De Jong 1985, De Jong 1987, De Jong 1988).
The structure of the individual mathematical objects that are manipulated by the genetic algorithm can be more complex than the fixed length character strings. Smith (1980, 1983) departed from the early fixed-length character strings by introducing variable length strings, including strings whose elements were if-then rules (rather than single characters). Holland's introduction of the classifier system (1986) continued the trend towards increasing the complexity of the structures undergoing adaptation. The classifier system is a cognitive architecture into which the genetic algorithm is embedded so as to allow adaptive modification of a population of string-based if-then rules (whose condition and action parts are fixed length binary strings).
In addition, we have recently shown that entire computer programs can be genetically bred to solve problems in a variety of different areas of artificial intelligence, machine learning, and symbolic processing (Koza 1989, 1990). In this recently developed "genetic programming" paradigm, the individuals in the population are compositions of terminals and functions appropriate to the particular problem domain. The set of terminals used typically includes inputs (sensors) appropriate to the problem domain and various constants. The set of functions used typically includes arithmetic operations, mathematical functions, conditional logical operations, and domain-specific functions. Each function in the function set must be well defined for any element in the range of every other function in the set which may appear as an argument to that function.
Often in writing computer programs, a portion of the programming code (e.g., a subroutine) is dedicated to defining a function, such that a particular calculation can be performed on various different combinations of arguments. For example, if an exponential function in the form of an approximation of the first five terms of the Taylor series is applied to a single variable x on one occasion in a computer program, a programmer could use the following code: EQU 1.0+X+0.5X.sup.2 +0.1667X.sup.3 +0.04167X.sup.4,
or the equivalent code in whatever programming language that is being utilized. However, if the same exponential function is applied to another variable y or a quantity, such as 3z.sup.2, later in the same program, the programmer would have to tediously reproduce the code for the specific variables. For instance, the code for variable y would be: EQU 1.0+Y+0.5Y.sup.2 +0.1667Y.sup.3 +0.04167Y.sup.4
and the code for the quantity 3z.sup.2 would be: EQU 1.0+3Z.sup.2 +0.5(3Z.sup.2)(3Z.sup.2)+0.1667(3Z.sup.2)(3Z.sup.2)(3Z.sup.2)+0.04167(3Z.s up.2)(3Z.sup.2)(3Z.sup.2)(3Z.sup.2)
In order to overcome this tedious process of writing separate code for each of the three situations, a programmer would want to be able to define a function in terms of a dummy variable (i.e., formal parameter) dv to accommodate all three uses of the exponential function, such as: EQU Define Function exp (dv)=1.0+dv+0.5dv.sup.2 +0.1667dv.sup.3 +0.04167dv.sup.4
Once a function has been defined, it can be called an arbitrary number of times from an arbitrary number of different places in the program with different instantiations of its dummy variable (i.e., formal parameter), such as x, y and 3z.sup.2. Thus, the process of rewriting code can be avoided. Furthermore, defining functions enhances the understandability of a program because common calculations are highlighted.
Moreover, by defining and making multiple uses of a function, a problem can be decomposed into a hierarchy of which the defined function is a part. If one defined function is allowed to make use of another, previously defined function, the hierarchical decomposition is accentuated. Moreover, if one defined function is allowed to call on itself, either directly or indirectly, through a sequence of other functions, the hierarchical decomposition may be even more accentuated As a problem increases in size and complexity, decomposition of a problem using a function definition becomes an increasingly important tool for solving problems.
What is needed is a way to apply some of the general principles of biological natural selection that are embodied in the conventional genetic algorithm (i.e. survival of the fittest and crossing over of parents's traits to offspring) to a greatly expanded class of problems. In particular, what is needed is a method for adaptively creating computer programs involving complicated combinations of mathematical functions and their arguments, recursions, symbolic processing, and complicated data structures with no advance limitations on the size, shape, or complexity of the programs, including the use of function definitions created for the particular problem domain. One object of the present invention is to provide a genetic process to provide solutions for an expanded class of problems. A further object of the present invention is to provide a genetic process without any predetermined limits on the size, shape, or complexity of the members of the subject population.
In solving problems with genetically bred computer programs using a population composed of terminals and functions appropriate to the particular problem domain, a search space is developed for solving the problem in conjunction with the computer programs. This search space is the hyperspace of all possible compositions of functions that can be recursively composed of the available functions and terminals. The symbolic expressions (S-expressions) of the LISP programming language are an especially convenient way to create and manipulate the compositions of functions and terminals described above. These S-expressions in LISP correspond directly to the "parse tree" that is internally created by most compilers.
The basic genetic operations for the genetic programming paradigm are fitness proportionate reproduction and crossover (recombination). Fitness proportionate reproduction is the basic engine of Darwinian reproduction and survival of the fittest and operates for the genetic programming paradigm in the same way as it does for conventional genetic algorithms. The crossover operation for the genetic programming paradigm is a sexual operation that operates on two parental programs (i.e. LISP S-expressions) and produces two offspring S-expressions using parts of each parent. In particular, the crossover operation creates new offspring S-expressions by exchanging sub-trees (i.e. sub-lists) between the two parents. Because entire sub-trees are swapped, this genetic crossover (recombination) operation produces syntactically and semantically valid LISP S-expressions as offspring regardless of which allowable point is selected in either parent.
This genetic programming paradigm has been successfully applied (Koza 1989, 1990) to example problems in several different areas, including, but not limited to, (1) machine learning of functions (e.g. learning the Boolean 11-multiplexer function), (2) planning (e.g. developing a robotic action sequence that can stack an arbitrary initial configuration of blocks into a specified order), (3) automatic programming (e.g. discovering a computational procedure for solving pairs of linear equations, solving quadratic equations for complex roots, and discovering trigonometric identities), (4) sequence induction (e.g. inducing a recursive computational procedure for the Fibonacci and the Hofstadter sequences), (5) pattern recognition (e.g. translation-invariant recognition of a simple one-dimensional shape in a linear retina), (6) optimal control (e.g. centering a cart and balancing a broom on a moving cart in minimal time by applying a "bang bang" force to the cart), (7) symbolic "data to function" regression, symbolic "data to function" integration, and symbolic "data to function" differentiation, (8) symbolic solution to functional equations (including differential equations with initial conditions, integral equations, and general functional equations), (9) empirical discovery (e.g. rediscovering Kepler's Third Law, rediscovering the well-known econometric "exchange equation" MV=PQ from actual time series data for the money supply, the velocity of money, the price level, and the gross national product of an economy), and (10) simultaneous architectural design and training of neural networks.
Prior Art Function Definition
To applicant's knowledge, there is no known usage of automatic function definitions in conjunction with the genetic programming paradigm.
Prior Art Data Encoding
To Applicant's knowledge, genetic algorithms have not been applied to data or image compression. Numerous methods of presentation of video image information on display devices are well-known in the art. One such method involves displaying image data on a red-green-blue (RGB) display monitor. In an RGB color system, a display may be controlled by presenting pieces of color information to drive circuitry which in turn produces three electrical signals which control the red, green and blue colors on the display.
Image data for an image display device, such as a video display or a printer, is typically organized into multiple lines (e.g., scanlines), with each line holding image data for a fixed number of "pixels" (picture elements). The image data stored for each pixel can vary from a single bit for black and white images, 8 bits for representing 256 colors, or even more bits to represent even more colors. The image data stored for a pixel is often also stored as an ordered set (vector) of three such numerical values for color, each denoting the level of the various color attributes (e.g., red, green, blue, etc.). Where the number of bits representing a pixel is large, the amount of memory required to store all of the pixels corresponding to an image, and thus to store the image, is large. In order to reduce the amount of memory required to store or bandwidth required to transmit an image, image (data) compression is typically employed.
Various methods of data compression are known in the field. Known methods often rely on the removal of redundant information or the encoding of information in a more compact representation, including the method of fractal data compression. Fractal data compression and its application to image compression are discussed in The Use of Fractal Theory in a Video Compression System by Maaruf Ali, et al.. Another method of image compression is discussed by Karl Sims in Artificial Evolution of Computer Graphics.
Sims (1991) creates complex visual structures, textures, and motions on a video monitor using a three step process of random generation, personal selection, and mutation of LISP S-expressions. First, Sims randomly generates hundreds or thousands of randomly generated LISP S-expressions on a video monitor. Secondly, Sims selects those he found to have interesting visual structures, textures and motions. When an S-expression has an explicit time variable, the S-expression creates a video image which varies with time and thus presents motion. The images are presented on the video monitor of a computer workstation and Sims communicates his selections to the computer by means of the interactive features of the computer. Third, Sims modifies interesting S-expressions by random mutation of the S-expression, and, in some cases, by directed mutation. In directed mutation, Sims applies his own experience as to what specific mutations are likely to produce particular interesting or desired changes in the video image. By using his three step process of random generation, selection, and mutation (random or directed), Sims discovers an impressive variety of different visual images. Programming comparable images from scratch would have been extremely difficult or impossible.
What is needed is a way to apply some of the general principles of biological natural selection that are embodied in the conventional genetic algorithm (i.e. survival of the fittest and crossing over of parents' traits to offspring) to a greatly expanded class of problems. In particular, what is needed is a method for adaptively creating computer programs involving complicated combinations of mathematical functions and their arguments, recursions, symbolic processing, and complicated data structures with no advance limitations on the size, shape, or complexity of the programs to compress (encode) data to minimize computer storage, transmission costs or some other significant metric. One object of the present invention is to provide a genetic process to provide solutions for an expanded class of problems. A further object of the present invention is to provide a genetic process without any predetermined limits on the size, shape, or complexity of the members of the subject population.