1. Field of the Invention
The present invention relates to a fault tolerant and combinatorial software environment. More specifically, the present invention relates to a system, method and medium for providing and facilitating a virtual machine and instruction format that facilitates systematic or arbitrary alteration and recombination of portions of code (e.g., programs) at run-time. Instructions that comprise a language that can be used with the present invention can be individualized, allowing for differentiation of each instruction based on, e.g., usage, origin, authorship, or any other user-defined tag. The overall instruction set can be extended to include any object or procedure that can be represented by software. The invention is envisioned for use in applications such as those relating to automatic program creation (e.g. genetic programming), artificial/directed evolution of digital designs (graphic, industrial, mechanical, architectural, engineering, etc.), online adaptive software systems, product and process reliability testing, and simulation of biological, non-linear, and probabilistic processes.
2. Related Art
Numerous computer-programming languages exist today. These include assembly languages, and higher-level languages such as BASIC, Visual Basic, C, C++, Java, etc. Although many computer systems/languages are increasingly compared to DNA and genetics, they do not include many desirable attributes of biological xe2x80x9ccomputationxe2x80x9d systems and therefore are deficient for implementing robust, evolvable, life-like computer programs. Specifically in existing computer systems/languages:
Instructions must have proper syntax and execution context.
Instructions cannot be arbitrarily recombined to form new programs.
Superfluous, xe2x80x9cjunkxe2x80x9d, instructions are not allowed within programs.
Translation of instructions is deterministic (i.e. it is not influenced by the environmental conditions at runtime nor is it probabilistic)
Although algorithms and programming methods have been specifically developed to emulate the evolutionary process (e.g. xe2x80x9cGenetic Algorithmsxe2x80x9d, xe2x80x9cGenetic Programmingxe2x80x9d), these also fail to provide the facilities of natural systems, are often cumbersome to deploy, and are not inherently designed to facilitate evolution. The deficiencies of Genetic Programming as a general purpose platform for evolving solutions to problems and for providing a computational platform with the facilities of molecular biology include:
Programs must be syntactically correct to be bred or evaluated.
Standard features of computer languages, such as looping, conditional logic blocks, and subroutines, cannot be implemented in the customary fashion.
Automatically created programs cannot themselves create, inspect, test, and terminate other automatic programs.
The behavior (number of inputs and outputs) of a program must be known in advance of breeding and evaluation.
Programs can""t be marked to identify sites for mutation or other genetic operations.
The genetic lineage of programs can""t be determined by examining the programs themselves.
Instructions are not interchangeable or redirectable.
Implementations do not provide for multi-tasking, e.g. the simultaneous evaluation of two or more evolvable programs.
The environment and instruction set cannot be altered while programs are running.
Genetic programming is a biologically inspired general-purpose search technique for discovering solutions to complex problems pioneered by John Koza. Typically implemented in LISP or C/C++, Genetic Programming involves creating a population of syntactically correct programs (in LISP) or data structures (in C), then breeding successive generations of syntactically correct programs or data structures guided by the Darwinian principle of xe2x80x9csurvival of the fittest.xe2x80x9d Analogs of naturally occurring operations such as sexual recombination, mutation, and gene deletion are used to generate the programs. Utilizing this approach enables computers to develop solutions to a given problem without advanced knowledge of the form of the solution. For a further description of Genetic Programming see Koza, John R., xe2x80x9cGenetic Programming III; Darwinian Invention and Problem Solvingxe2x80x9d (1999).
FIG. 1 demonstrates the concept of breeding programs. In this example, an application is contemplated that creates a picture by invoking two programs (A and B), which create an ellipse and a rectangle respectively. The concepts of xe2x80x9ccross-overxe2x80x9d and xe2x80x9cmutationxe2x80x9d are employed to generate variations of the original picture.
Referring now to FIG. 1, several pictures are set forth where each picture consists of two components, an ellipse (A) and a rectangle (B). In the original representation 102, the application used coordinate data 5050150150 to produce the ellipse, and 100100200200 to produce the rectangle. Below original representation 102 are four exemplary representations depicting the concept of xe2x80x9ccross-over.xe2x80x9d For example, in representation 104, it can be seen that the data elements after the first position have been exchanged. In other words, the four instructions following the first xe2x80x9c50xe2x80x9d in the ellipse program have been crossed over (i.e., swapped) with the four instructions following the first xe2x80x9c100xe2x80x9d in the rectangle program. The remaining three other such representations (106, 108 and 110) have resulted from aspects of original representation 102 being crossed over in a similar fashion.
Representation 112 depicts the concept of mutation. Specifically, rather than simply moving around the same data provided by the original representation 102, at least some of the data is mutated (i.e., changed), thus yielding different values, as shown. Thus, from using such concepts as cross-over and mutation, numerous xe2x80x9cpossibilitiesxe2x80x9d can be created, one or more of which can then be selected by a designer according to his or her personal taste or objectively tested according to a machine-based fitness function such as maximizing the axial symmetry of the picture.
A fundamental problem in using genetic programming to evolve solutions to problems is that programs cannot be arbitrarily recombined without producing software with invalid syntax, nonsense logic, overflow errors, etc. Programs must have correct, balanced, xe2x80x9cperfectxe2x80x9d grammar to compile or even be interpreted. If a semicolon xe2x80x9c;xe2x80x9d is missing at the end of a line in a program written in C, or a xe2x80x9cnextxe2x80x9d does not follow a xe2x80x9cforxe2x80x9d in a program written in BASIC, the software will not execute. For example, the code strip below returns the temperature in Fahrenheit given a Celsius value:
xe2x80x83F=(9/5*C)+32
If a cut is made after the multiply sign and the second part of the piece is recombined in front of the first part, the new code will appear as:
F=C)+32(9/5*
If this new code were passed to a conventional computer for execution or a compiler for translation, the procedure would terminate at the second instruction xe2x80x9c)xe2x80x9d which has no meaning without a prior parentheses. Termination is an unrecoverable error that stops the entire computing process in the middle of execution and typically requires human interaction in order to resume. Traditional computing environments are therefore unsuitable for automatically evolving solutions to problems or designing a robust life-like computing system.
As noted above, existing virtual and silicon machines (computers) are not designed to process random code, and will halt execution on unhandled errors. In order to get around this problem, existing implementations of genetic programming impose rigid constraints on program breeding in order to ensure that programs will run. This restricts the space of potential programs to a sub-space of all grammatically correct or xe2x80x9clegalxe2x80x9d programs. This restriction may hinder the achievement of an optimal solution as all paths must be syntactically correct, thereby prohibiting xe2x80x9cshort cutsxe2x80x9d or xe2x80x9ctrespassesxe2x80x9d through the space of xe2x80x9cillegal programs.xe2x80x9d
In addition to syntactical errors, the arbitrary recombination of programs frequently produces programs containing circular logic and infinite loops that cannot be readily terminated by existing genetic programming systems without human intervention.
Existing computer architectures do no allow for the inclusion of junk instructions which have the potential of later being executed (silent instructions). It has been estimated that as much as 97% of the human genome is comprised of junk DNA or xe2x80x9cintronsxe2x80x9d. Although the role of introns is not fully understood, no software system presently allows for their inclusion (Some believe that they provide a pool of genetic material that can be recombined in reproduction to create innovative new genes.) Thus, current software languages prohibit inclusion of xe2x80x9cjunkxe2x80x9d, and invalid syntax, both highly desirable capabilities in evolving software if one desires to emulate the processes and capabilities of molecular biology.
As the arbitrary combination of elements in the genetic code of DNA produce useful new parts, these parts can be immediately deployed in the cell or used in reproduction. In the latter case, the new genetic units become part of the overall pool from which succeeding generations are created. However, this real-time extensibility is not a feature of existing software languages. Although it may be possible to detach and update object libraries that a program uses through operating system commands, existing computer languages do not provide functions to programmatically edit and update the elements of the libraries, i.e. prior art languages do not provide features for programs written in such languages to modify themselves at run-time.
The above-noted shortcoming limits the speed that programs can be evolved, as the process must continually be terminated and then restarted at each generation. It also does not enable online adaptation or cooperative computing in which one personal computer is in production while another one is evolving improvements to the library. In general, it would further be desirable to be able to objectize the constituents of a software program (e.g. its key words, constants, operators, any user-defined functions, and combinations thereof) so that their methods can be executed upon being encountered and that new constituents formed during the execution of the program (and those formed during the genetic algorithm process) can be objectized without terminating the program.
The existence of the deficiencies mentioned above has meant that certain other related deficiencies also exist. For example, prior art computer systems are highly deterministic. Once compiled, a program""s behavior is set for all time. A more flexible system would allow the execution of code to be dependent on the context (configurable) of the program at run-time, just as the expression of a gene is dependent of the circumstances of the cell at the time of translation. Similarly, for modeling and theoretical purposes, random numbers are often used to, e.g., decide whether a binary state is on or off. Taking that concept a step further, it may be useful under certain circumstances for a specified constituent of a computer program (e.g., a xe2x80x9c+xe2x80x9d sign) to be randomly executed (i.e., sometimes the constituent is executed when it is encountered in a computer program, and sometimes it is not). Building in the ability to set and reset an instruction""s probability of execution would also help facilitate learning and program self-modification, as broken code could be deactivated while good code could be executed with a 100% probability. Were such deactivation of portions of code allowed to occur in current software environments, the program may become syntactically incorrect, thus generating an error.
Another deficiency of the prior art is the lack of ability to dynamically switch between treating information as executable code and treating it as data. A remedy to this deficiency would find use in automatic program generation. For example, the ability to switch between data and program modes would provide greater flexibility in creating, inspecting, and testing programs, and passing programs to other programs. For example, in normal execution mode, the program xe2x80x9c+xe2x88x92*/xe2x80x9d would generate four successive math errors. However, if the constituents of the program were treated as data where each had a corresponding instruction number (e.g., 395, 396, 397, 398), the data could then be manipulated mathematically and utilized as a toolkit for synthesizing mathematical programs, or used as a template for searching programs.
Programming languages such as LISP have a limited ability to switch between data and executable code mode in that it can manipulate text-based items rather than processing them. However, LISP does not assign numerical designations to all of its instructions. Thus, it does not manipulate the items on a numerical level, and would be deficient in implementing the concepts mentioned above.
Present day microprocessors and virtual machines utilize a set of instructions that point to one precise action or object. Distinguishing between instances of machine instruction is not provided. For example, a particular instruction might point to the xe2x80x9caddxe2x80x9d operator, but there is no way of tagging one program""s xe2x80x9caddxe2x80x9d operator so that it is distinguishable from another""s. This feature would be useful in tracking instruction lineage in genetic programming. In examining the instructions of the offspring after breeding two programs, the determination of what instruction came from which parent would be highly desirable.
The ability to tag and thereby individualize instructions would have many uses. For example, the authorship of code segments could be marked and usage statistics kept. Mutation rates could be assigned to each individual instruction. Unutilized features of a program could be identified and spliced out, while highly used areas could be selected for breeding or identified for software pricing. The inability to individualize each instruction in current environments thus limits the possibility to analyze, study, research, modify, and price programs.
From the above, it can be appreciated that xe2x80x9cprior artxe2x80x9d computers (virtual and silicon) and software languages are deficient in emulating many of the characteristics that make up the evolutionary process and do not allow the deliberate or arbitrary recombination of software instructions. Existing computers terminate execution on unhandled errors and are therefore not fault tolerant. Code cannot be arbitrarily reassociated. Unused code (analogous to introns) cannot reside embedded in software programs as the code itself lacks the facility to determine which sections to execute or ignore. Software languages cannot be compiled or interpreted if syntax errors exist. Extant languages do not permit real-time extensibility, the ability to add new functionality without stopping execution of the program and recompiling. Existing machine instructions which make up software languages and are processed by the computer are uni-dimensional and do not provide for individualization (e.g., tagging). These weaknesses demanded the development of a new computer and software environment to overcome the stated shortcomings.
The present invention overcomes the deficiencies mentioned above by providing a fault tolerant software environment. In particular, embodiments of the present invention envision that various program components (e.g., portions of computer programs, applications, etc) are objectized into entities represented by (and referred to herein as) xe2x80x9ccodons,xe2x80x9d as described herein. Further, it is thus envisioned that a computer program, comprising a plurality of codons, is executed codon by codon utilizing a virtual machine format. This is implemented such that, for example, executing a codon that would not have been expected in view of previously-executed codons (e.g., a codon requiring two inputs is executed without those inputs being available) does not, by itself, cause the program containing the codon to crash or terminate. Instead, the program will continue to run, albeit possibly without generating a meaningful result. This thus allows for improper syntax to occur, allowing for the inclusion of xe2x80x9cjunkxe2x80x9d code.
The fault-tolerant aspects of the present invention enable, for example, combinatorial operations such as genetic programming. In the environment as envisioned herein, various portions of a computer program can be separated and spliced together, creating new and potentially different functional portions. As these portions are created, embodiments of the present invention contemplate that they can immediately be used in (and as part of) the program without concern for whether they are syntactically correct, and without having to terminate the execution of the program to re-compile it.
Embodiments of the present invention also contemplate the ability to probabilistically execute individual codons. That is, a codon can be selected to, for example, execute only a certain percentage of the time that it is encountered during execution of a program. The fault-tolerant aspects provided by embodiments of the present invention facilitate this, since the non-execution of a codon could otherwise lead to syntactical errors and adverse results. In general, probabilistic execution can allow for, for example, an enhanced capability to facilitate various types of modeling.
In addition, embodiments of the present invention also provide the capability of being able to switch between treating information as executable code or as data. An advantage to being able to treat programs as data is, for example, that it allows for a greater ability to execute programs in a genetic programming environment. Furthermore, treating programs as data can also facilitate program self-analysis, self-modification and the ability for programs to readily create other programs. A silent mode, as contemplated by embodiments of the present invention, also facilitates these concepts by allowing specified codons not be executed.
Also, embodiments of the present invention provide that the individual codons can be tagged so that additional information can be associated with them. This information can include any number of attributes, such as the number of times the codon was encountered, its origin, etc. Thus, in one sense, the individual codons become xe2x80x9cread-writable,xe2x80x9d where information pertaining to them can be stored, and then read for subsequent purposes.
Further, embodiments of the present invention provide for stack tagging capabilities, which facilitates analysis of functional portions of a program.
The present invention also contemplates numerous additional aspects, which will be described further below.