1. Field of Invention
This invention is directed to methods and systems for building and executing complex software. In particular, this invention relates to self-optimizing software employing learning and planning techniques. More specifically, this invention relates to software that tolerates outside interference and supports dynamic reconfiguration.
2. Description of Related Art
Traditional software systems tend to be rigid and fragile, in the sense that such software systems will fail when faced with problems, failures, or environmental situations that were not anticipated by the designer. Part of the reason that software systems fail is that algorithmic procedures tend to be instruction-sequenced and need to explicitly account for all possible contingencies. That is, if the design has not anticipated and accounted for an alternative, an occurrence of that alternative results in the software system failing. Some high-level language features, such as exception handling mechanisms and dynamic dispatching of object-oriented language modules, are designed to address the problem of instruction sequencing.
These problems relate to three basic technology areas, including (i) system building procedures and dynamic system configuration, (ii) software fault tolerance, and (iii) formal system descriptions and programming paradigms.
One approach to system building procedures and dynamic system configuration is discussed in Nelson H. F. Beede, "The Design of G Make- and Extended Implementation of UNIX Make", Technical report, Center for Scientific Computing and Department of Mathematics, University of Utah, Salt Lake City, Utah, Feb. 5, 1990, herein incorporated by reference in its entirety. Beede discusses the UNIX "make" utility as using time-stamps associated with files to derive the execution of tools that generate new files. A tool's execution is controlled by the time-stamps of that tool's inputs and outputs. This mechanism does not allow for redundancy. That is, if a tool fails, then a computation, e.g., a build-process, fails. The UNIX "make" utility does not, and because of the lack of redundancy need not, optimize its execution through planning. The "make" utility does not learn and does not allow iteration. Iteration in this instance refers to targets that directly or indirectly depend on themselves through successive iterations.
Another approach to system building procedures and dynamic system configuration is to achieve a reconfigurable system. For example, U.S. Pat. No. 5,634,058 to Allen et al. describes a method for dynamically loading software modules based on need. This method merely deals with the mechanics of loading modules, rather than deciding which of a set of alternative modules to load. The system described in U.S. Pat. No. 5,515,524 to Lynch et al. specifies and builds a software configuration based on structural descriptions, requirements, and constraints. The system is static in one sense because once constructed, the system's configuration cannot be altered while the system is running.
A more dynamic approach to software configuration is described by Kramer et al., "Dynamic Configuration for Distributed Systems", IEEE Transaction on Software Engineering, 11(4):424-436, 1985. This approach essentially provides the ability to modify and extend a system while it is running. A change in configuration is made explicitly by changing a configuration description, and is not automatically based on the given problem, failure, and cost, i.e., run-time, quality, etc., of the components. Accordingly, a configuration change does not occur "on the fly" as the system encounters problems, failures, and changing cost parameters of system components.
Another dynamic system reconfiguration is provided by Marzullo et al., "Tools for Distributed Application Management", Computer, 24(8):32-51, August, 1991. In contrast Kramer's system, Marzullo describes a system called Meta which realizes dynamic system reconfiguration. The Meta system changes its configuration while the system is running in response to problems, failures and changing cost parameters of system components that are encountered. Meta achieves this by using a separate monitoring process that observes and controls the execution of the actual program through sensors and actuators provided by its functional components. A monitoring program or sensors or actuators, is required and modification of control flow is not necessarily based on the time-ordering of changed data.
Software fault tolerance is also used to solve the problems of system failure and environmental situations that were not anticipated by the designer. The N-Version Version method disclosed in Eckhardt et al., "An Experimental Evaluation of Software Redundancy As a Strategy for Improving Reliability", IEEE Transactions on Software Engineering, 17(7), pages 692-702, 1991, produces highly reliable software systems by using multiple functional components that are independently developed for the same specification. The multiple functional components are executed concurrently and the system votes among the generated results. This method primarily deals with possible design and coding errors in independently developed components. The N-Version method uses only homogenous redundancy. That is, all components are intended to perform the same function. Additionally, all alternatives are executed; there is no dynamic selection of the components. Nor do N-version systems adapt or learn.
Other approaches to achieving software fault tolerance include different approaches for using recovery blocks, rollback, and re-execution. Such approaches are described in U.S. Pat. Nos. 5,530,802, and 5,440,726 to Fuchs et al. Generally, these methods make software more fault tolerant through the use of rollback and re-execution of failing software components. This will succeed if the fault is intermittent. If the fault is repeatable, Fuchs et al. discloses rearranging the input data in order to avoid the fault execution path. Such a system may attempt to re-execute a failing sequence of functions.
A Petri net is a formalism for describing concurrent systems and processes, as described in Reisig, "Petri Nets", Springer, Berlin, 1982. Petri nets have two kinds of nodes, data location nodes and transition, or action, nodes. Petri nets use tokens to determine when a transition can be executed. A token indicates the availability of data. A transition is executed when tokens are available on all input storage locations. Scheduling of transitions is determined by the tokens. A transition will consume all input tokens and will produce tokens on its output storage locations. Petri nets are a descriptive formalism rather than a method of computation and do not adapt or learn.
An alternative programming paradigm is the dataflow program. Execution of actions in dataflow programs are described in Herath et al., "Parallel Algorithms and Architectures", pp. 25-36, Springer-Verlag, New York, N.Y., 1987. Execution of actions in dataflow programs is triggered by the availability or arrival of input data. The execution of a node in a dataflow program is not controlled by the availability of data on the outputs, as it is in the present invention. Dataflow programs do not use redundancy.