A. Field of the Invention
The present invention relates to computer-based interface, methods and systems for graphic information storage and retrieval, visual modeling and dynamic simulation of complex systems, that includes in general sets of processes and their participants, and more specifically chemical processes, and which result in complex networks of multidimensional pathways, encapsulating both information and mathematical models within modular components, and integrates inferential control with quantitative and semiquantitative simulation methods, providing a variety of alternatives to deal with complex dynamic systems.
B. Related Applications
This application is related to the patent entitled "Computer-Based System, Methods and Graphical Interface for Information Storage, Modeling and Simulation of Complex Systems Organized in Discrete Compartments in Time and Space" invented by the same inventor and filed with this application in the United States Patent and Trademark Office. This application is incorporated here by this reference.
C. Description of the Prior Art
1. Computer-Aided Physiological and Molecular Modeling and Artificial Intelligence in Molecular Biology
a) Most computer-aided physiological and molecular modeling approaches have resulted in computer models of physiological function are numerical mathematical models that relate the physiological variables using empirically determined parameters. Those models, which can become quite complex, aim at modeling the overall system.
b) Both molecular biology and medicine have been fields of previous activity in the application of artificial intelligence (AI). In molecular biology, although there were some early systems such as Molgen and Dendral, the activity has intensified recently as a consequence of the explosion in new technologies and the derived data, mostly related with the Human Genome project and the handling of large amounts of sequence data generated, relating to both DNA and proteins. There has also been an increased interest in computer methodologies in 3D structural models of molecular interactions. As an example, the topics covered in symposia such as the recent Second International Conference on Intelligent Systems for Molecular Biology, 1994, Stanford University, CA. (see the reference to its Proceedings for a current state of the art, which here included by reference) cover areas as wide as machine learning, automatic generation of representations, inductive and deductive reasoning, case-based reasoning, computational linguistics applied to both text and DNA or protein sequences, constraint propagation, bayesian inference, neural networks, qualitative modeling, expert systems, object-oriented databases, simulation, knowledge acquisition technologies, and knowledge base maintenance. In those and many other published papers, there are as many objectives and as many approaches as there are teams. The projects, however, are in most cases of academic interest, and in few cases it has been attempted to develop finish products for commercial distribution. Here, only two projects will be mentioned that have some common objectives with the system that is the object of this invention. Discussions over other previous approaches are also included in those references.
c) The Molgen group at the Stanford University's Knowledge Systems Laboratory has been studying for several years scientific theory formation in the domain of molecular biology, as reported in by Karp, P. D. and Friedland, P. (included here by reference). This project relates to the system object of this invention in that both "are concerned with biochemical systems containing populations of interacting molecules . . . in which the form of knowledge available . . . varies widely in precision from quantitative to qualitative", as those authors write. However, the domain of Molgen is a small subset of the domain of the system object of this invention. Within the Molgen group, the Ph.D. dissertation of P. D. Karp (included here by reference), focused on applying that research to the design of a computer-assisted reasoning and hypothesis formation system. In the process, the author "developed a qualitative biochemistry for representing theories of molecular biology", as he summarizes in an abstract with the same name the AI Magazine, Winter 1990, pp 9-10. In particular, he developed three representation models to deal with the biochemical pathways related to the tryptophan operon in bacteria, called declarative device models, each having different capabilities and using different qualitative reasoning approaches. Only chapter 3, dealing with those models, have relevance in the context of this invention. The first model uses IntelliCorp's KEE (a commercial product) frames to describe biological objects and KEE rules to describe chemical reactions between the objects, which he recognizes to have serious limitations because is not able to represent much of the knowledge available to biologists. The objective of the second model, which is an extension of the fixed state-variable network of deKleer and Brown, is to predict reaction rates in a given reaction network, incorporating a combination of quantitative and qualitative reasoning about state-variables and their interdependencies. The drawbacks are that this model is not able to incorporate a description of the biological objects that participate in the reactions, and in addition it does not have a temporal reasoning capabilities, representing just an static description of the state variables and their relationships. The third model, called GENSIM and used for both prediction and hypothesis formation, is an extension of model 1, and is composed of three knowledge bases or taxonomical hierarchies of classes of a) biological objects that participate in the trp-operon gene-regulation system, b) descriptions of the biological reactions that can occur between those objects, and c) experiments with instances of those classes of objects. The GENSIM program predicts experimental outcomes by determining which reactions occur between the objects in one experiment, that create new objects that cause new reactions. GENSIM is also further used in conjunction with the HYPGENE hypothesis-formation program that will not be further discussed here. Characteristics of the GENSIM program that may be relevant, although different, for the system of this invention are:
chemical objects are homogeneous populations of molecules, objects can be decomposed into their component parts, and identical objects synthesized during a simulation are merged; PA1 chemical processes are frames arranged in an inheritance class hierarchy, which represent reactions between those populations as probabilistic events with two subpopulations, one that participate in the reaction and one that does not; PA1 restrictions are specified in the form of preconditions for chemical reactions to happen. PA1 those Forbus-style processes can create objects and manipulate their properties, but cannot reason about quantitative state variables such as Quantities. In his words, "processes are described as KEE units . . . specify actions that will be taken if certain conditions hold", and in that sense are like production-rules; PA1 temporal reasoning is not available, resulting again in static representations and simulating only behavior in very short time intervals. PA1 1. interpret, represent and describe the attributes, characteristics and composition of chemical and biochemical entities, such as those participating in chemical, biochemical, cellular, physiological, patho-physiological or pharmacological reactions (hereinafter called chemical entities), as well as the parallel and serial sets of processes in which they interact and by which they are regulated; PA1 2. compile such information and data and to store them in a more compact knowledge form, to construct a library of basic and reusable building blocks; PA1 3. interactively integrate sets of building blocks, using the basic paradigm of "Clone, Connect and Configure", into more complex knowledge structures, constructing dynamic models of the behavior of those chemical entities and processes; PA1 4. interactively integrate those simple models into more complex models that result in multidimensional networks of interacting pathways, which in turn can be reused as building blocks for even more complex models; PA1 5. programmatically integrate those models into multidimensional networks of interacting pathways, including branching and merging of pathways, cross-talk between pathways that share elements, and feed-back and forward loops; and PA1 6. dynamically simulate, optionally using the encapsulated absolute-valued or scaled-valued parameters and variables, the kinetic interactions represented in selected pathways, as defined within those more or less complex models. PA1 1. An Attribute Table contains information and data that describe the characteristics of that object in many different ways. It contains a set of slots that vary for different classes of objects. Each slot may have a simple value of the types integer, float, symbol, text, or truth-value, or it may contain parameters, variables or other objects, each with its specific Attribute Table. The values of parameters and variables are computed by generic formulas that apply to all the instances of a class. The values of a particular instance of a variable may also be computed by a formula defined in its Attribute Table. Simulated formulas may contain algebraic, difference and differential equations. PA1 2. A Menu with various options which upon selection perform specific tasks related to that particular object. The options available vary for different classes of objects and, within each class, depend on the operation mode. PA1 3. An optional Subworkspace that contains a set of other objects and/or other bioObjects, with their corresponding sets of associated structures. PA1 4. With this architecture, interactive tasks are defined within the Menu, and biological information and data is encapsulated in two formats, either in the form of a table or visually by icons. PA1 1. taxonomic reasoning, such as that used in the classification of entities and processes; PA1 2. numeric algorithms, PA1 3. graphic and connectivity reasoning, such as that used when referring to iconic objects and their connections; PA1 4. default reasoning, such as that provided by the default values of variables and parameters, which are based either on available knowledge, informed assumptions, or guesses, and which are used in the absence of overriding information; PA1 5. data-driven reasoning, which is propagated forward; PA1 6. goal-driven reasoning, which is propagated backward; PA1 7. event-driven reasoning, which propagates when a precondition is satisfied; PA1 8. time-driven reasoning, which propagates when the specified time interval elapses; and PA1 9. constraint propagation throughout the network. PA1 1. The functional structure of the system in consideration, which is constructed using experimentally obtained qualitative information, such as the identification of the chemical entities involved, the knowledge about their structural and functional characteristics, and about the relationships and qualitative interactions between those entities. This information is mapped into the knowledge structures represented by the different classes of bioObjects, and into the connections and relations that connect and relate these knowledge structures. The graphical environment merges data and information flow with sequential control, and the system supports both arithmetic and symbolic expressions. PA1 2. The mathematical component is represented by a set of model differential and algebraic equations that define the system's variables and describe their behavior, together with the set of associated parameters that control the behavior of the variables and the system as a whole. The model can be then view as a set of embedded block diagram representations of the underlying equations that can be used to ask what if questions, and for dynamic numerical simulation and prediction of the effects of perturbations on the system. PA1 3. The combination of both components allows the use of the same model for both symbolic and numerical simulations and predictions, and to reason about the origin and reasons of the outcome.
2. Knowledge-Based and Model-Based System Shells
a) Several knowledge-based system shells have been used as tools allowing fast development of domain-specific applications. HERACLES, one of the first shells was developed by generalizing and separating the domain specific knowledge of NEOMYCIN from the underlying expert system methodology (see W. J. Clancey, "From GUIDON to NEOMYCIN and HERACLES in Twenty Short Lessons: ORN Final Report 1979-1985", in AI Magazine, August 1986, p.40-60, August 1986).
b) A knowledge-based system interprets data using knowledge added to the system by a human domain expert. This knowledge-base may contain diverse forms of knowledge, represented at each end by: a) shallow knowledge or heuristics, such as human experience and interpretations or rules-of thumb; and b) deep knowledge about the system behavior and interactions. The systems that mainly based in the first type of knowledge are in general referred to as knowledge-based expert systems, and the logic is represented in the form of production rules. In the more advanced real-time expert systems, inferencing techniques are usually data-driven using forward chaining, but can also employ backward chaining for goal-driven tasks and for gathering data. The inference engine searches for and executes relevant rules, which are separate from the inference engine and therefore, the representation is intrinsically declarative.
c) Model-based systems can be derived from empirical models based on regression of data or from first-principle relationships between the variables. When sufficient information to model a process--or part of it--is available, a more precise and compact system can be built.
d) Object-oriented expert systems allow a powerful knowledge representation of physical entities and conceptual entities. In those systems, data and behavior may be unified in the class hierarchy. Each class has a template that defines the types of attributes characteristic of that class and distinguish it from another types of objects. Manipulation and retrieval of the values of the data structures may be performed through methods attached to a object's class. The structure of the object system, typically hierarchical, maintains associations among facts and relations between objects.
e) There is a number of commercially available shells and toolkits that facilitate the development of domain-specific knowledge-based applications. Of those, real-time expert-system shells offer capabilities for reasoning on the behavior of data over time. Each of the real-time object-oriented shells from various vendors offers its set of advantages, and each follows a different approach, such as compiled versus interpretative, and offers a different level of graphic sophistication. The specific shell currently selected for the implementation of this invention is Gensym Corporation's G2 Version 3.0 system, which is designed for complex and large on-line applications where large number of variables can be monitored concurrently. It is able to reason about time, to execute both time-triggered and event-triggered actions and invocations, to combine heuristic and procedural reasoning, dynamic simulation, user interface, database interface capabilities, and other facilities that allow the knowledge engineer to concentrate on the representation and incorporation of domain-specific knowledge to create domain-specific applications.
f) G2 provides a built-in inference engine, a simulator, pre-built libraries of functions and actions, developer and user-interfaces, and the management of their seamless interrelations. A built-in inspect facility permits users to search for, locate, and edit various types of knowledge. The text editor interactively guides the expert in entering and editing knowledge. Among G2's Inference Engine capabilities are: a) a focus mechanism with meta-knowledge determines which knowledge structures to invoke, and allows concurrent focus and asynchronous events; b) data structures are tagged with time-stamp and validity intervals which are considered in all inferences and calculations, taking care of truth maintenance; and c) intrinsic to G2's tasks are managed by the real-time scheduler. Task prioritization, asynchronous concurrent operations, and real-time task scheduling are therefore automatically provided by this shell. G2 also provides a graphic user interface builder, which may be used to create graphic user interfaces which are language independent and allow to display information using colors, pictures and animation. Dynamic meters, graphics, and charts can be defined for interactive follow-up of the simulation. It also has debugger, inspect and describe facilities. The knowledge bases can be saved as separated modules as ASCII files. The graphic views are also saved in an ASCII format, and can be shared with networked remote CPUs or terminals equipped with X Windows server software.