This invention relates generally to process execution and more particularly to determining causality for information stored during concurrent and distributed software process execution.
In application execution and analysis, tracing is a term having many similar but distinct meanings. Tracing implies a following of process execution. Often such tracing incorporates recording information relating to a process during execution. In essence, a process that executes and has information there about recorded is considered a traced process.
In the past, tracing of computer software application programs has been performed for two main purposesxe2x80x94debugging and optimisation. In debugging, the purpose of tracing is to trace back from an abnormal occurrencexe2x80x94axe2x80x94bug to show a user a flow of execution that occurred previous to the abnormal occurrence. This allows the user to identify an error in the executed program. Unfortunately, commands executed immediately previous to an abnormality are often not a source of the error in execution. Because of this, much research is currently being conducted to better view trace related data in order to more easily identify potential sources of bugs.
Debuggers are well known in the art of computer programming and in hardware design. In commonly available debuggers, a user sets up a trace process to store a certain set of variables upon execution of a particular command while the program is in a particular state. Upon this state and command occurring, the variables are stored. A viewer is provided allowing the user to try to locate errors in the program that result in the bug. Usually, debuggers provide complex tracing tools which allow for execution of a program on a line by line basis and also allow for a variety of break commands and execution options. Some debuggers allow modification of parameters such as variable values or data during execution of the program. These tools facilitate error identification and location.
Unfortunately, using multiprocessor or networked systems, it is difficult to ensure that a system will function as desired and also, it is difficult to ascertain that a system is actually functioning as desired. Many large, multiprocessor systems appear to execute software programs flawlessly for extended periods of time before bugs are encountered. Tracing these bugs is very difficult because a cause of a bug may originate from any of a number of processors which may be geographically dispersed. Also, many of these bugs appear intermittently and are difficult to isolate. Using a debugger is difficult, if not impossible, because multiple debugging sessions must be established and coordinated.
In contrast for optimisation, it is important to know which commands are executed most often in order to optimise a software program. For example, when an application during normal execution executes a first subroutine once, a second subroutine twice, and a third subroutine seventy times, each subroutine requiring a similar time span for execution, optimising the subroutine which runs seventy times is clearly most important. In system optimisation, tracing is not actually performed except in so far as statistics of routine execution and execution times are maintained. These statistics are very important because they allow for a directed optimisation effort at points where the software executes slowest or where execution will benefit most. Statistics as captured for program optimisation, are often useful in determining execution bottlenecks and other unobvious problems encountered. Examples of optimisation based modelling or tracing include systems described in the following references:
P. Dauphin, R. Hofmann, R. Klar, B. Mohr, A. Quick, M. Siegle, and F. Sotz. xe2x80x9cZM4/Simple: A general approach to performance measurement and evaluation of distributed systems.xe2x80x9d In T. Casavant and M. Singhal, editors, Readings in Distributed Computing Systems, pages 286-309. IEEE Computer Society Press, Los Alamitos, Calif., 1994;
M. Heath and J. Etheridge. xe2x80x9cVisualizing the performance of parallel programs.xe2x80x9d IEEE Software, 8(5):29-39, September 1991;
C. Kilpatrick and K. Schwan. xe2x80x9cChaosMONxe2x80x94application-specific monitoring and display of performance information for parallel and distributed systems.xe2x80x9d Proceedings of the ACMI ONR Workshop on Parallel and Distributed Debugging, May 1991; and,
J. Yan. xe2x80x9cPerformance tuning with an automated instrumentation and monitoring system for multicomputers AIMS.xe2x80x9d Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences, January 1994.
Software performance models of a design prior to product implementation reduce risk of performance-related failures. Performance models provide performance predictions under varying environmental conditions or design alternatives and these predictions are used to detect problems. To construct a model, a software description in the form of a design document or source code is analysed and translated into a model format. Examples of model formats are a simulation model, queuing network model, or a state-based model like a Petri-Net. The effort of model development makes it unattractive, so performance is usually addressed only in a final product. This has been termed the xe2x80x9cfix-it-laterxe2x80x9d approach and the seriousness of the problems it creates is well documented.
In order to determine that a process is in fact executing as desired or to construct a performance model for optimisation requires an understanding of causality within a software application. Commonly, the only causal connection determined automatically is precedence. For example, in determining system statistics, it is easily recorded which subroutine was executed when. This results in knowledge of precedence when the entire process is executed on a single processor. However, given this knowledge, it is difficult to determine anything other than precedence.
Time and Causality
For concurrent or distributed software computations a common synchronised time reference is unavailable. A system operating on the earth and another system operating in space illustrate this problem. When the system on earth performs an activity and transmits a message to the system in space, an evident time delay occurs between message transmission and message reception. Once a system is in space, synchronising its time source precisely with that of an earth bound system is difficult. When the system in space is moving, such a synchronisation is unlikely. A same problem, though on a smaller scale, exists in earth bound networks. Each computer is bound to an independent time source and synchronisation of time sources is difficult. With advances in computer technology and processing speeds, these synchronisation difficulties are becoming no less significant than those experienced with space bound systems.
The lack of a common time reference, as well as other problems with observing a distributed system, have led to a notion of causality that is probability based. This xe2x80x9cprobabilistic causalityxe2x80x9d is a probability estimate of an event having occurred. Probabilistic causality uses a database of information (e.g., application structure, network configuration), a sophisticated data reduction algorithm (i.e., expert system), and trace records to make an educated guess at the source of problems in a complex system based on observable events. Although probabilistic causality is useful for network fault diagnosis it should not be confused with the stricter definition of causality that is being espoused here which is not probability based. Examples of probabilistic causality are found in U.S. Pat. Nos. 5,661,668 and 5,483,637.
In order to determine causality, it is beneficial to determine which events happened before which other events, described here as precedence causality. Precedence is a commonly known form of causality; for example, an executable instruction is not executed until a previous instruction is executed given no branching instructions. This precedence based causality is used heavily for debugging. Often, once an anomaly is discovered during execution, previous executed instructions are reviewed to determine a cause of the anomaly. For single processor systems, such an analysis is straightforward; however for network applications, time source synchronisation presents problems and therefore, precedence is not immediately evident.
Because of the above when more than one computer are networked together, precedence is not determined through recording of time. Even when a synchronisation of clocks occurs via a communication link, a time delay caused by communication times exists between computers and the recorded times are inaccurate. The resulting clock times are not useful for determining precedence between instructions or activities executing on different processors.
In an attempt to overcome this problem, it has been proposed that a logical clock may be used to record time in the form of a partial ordering of recorded times. Several types of logical clocks are known for use in a classical model of a distributed system.
In the classical model of a distributed system, according to a survey paper by Schwarz and Mattem entitled xe2x80x9cDetecting causal relationships in distributed computations: in search of the Holy Grailxe2x80x9d (Distributed Computing, 7(3):149-174, 1994), a distributed system consists of N objects: PI . . . PN. The objects interact solely by point-to-point message communication with finite but unpredictable delay; knowledge about structure of a communication network is not available; first in first out (FIFO) order of message delivery is not assumed; and a global clock, or perfectly synchronised clocks local to each process, are not available. Each object executes a local algorithm to determine its reaction to incoming messages. The occurrence of actions such as a local state change or sending a message performed by the local algorithm are called events. Events are recorded atomically. Concurrent and co-ordinated execution of all local algorithms composes a distributed computation.
A distributed computation is described by ordering events to agree with an order of execution. Let Ei denote a set of events occurring in object Pi in the form of a history of events, and let E=E1∪E2 . . . ∪EN denote a set of all events of the distributed computation. These event sets evolve dynamically as computation progresses. Since each P1 is strictly sequential, its sequence of events, Ei, are ordered by their occurrence and written as Ei={ei1, ei2, ei3, . . . }.
For the classical model, three event types are recorded: a send event, a receive event, and an internal event. A send event reflects the fact that a message was sent asynchronously. A receive event denotes the receipt of a message together with local state changes according to the contents of that message. Internal events reflect changes to local object states. This description does not account for conflicts or non-determinism since it is based on events that have actually occurred. The precedence relation is used as a basis for constructing logical clocks. According to the precedence relation an event with a later logical time occurred after an event with an earlier logical time where. Also, two events with same logical times in an event set are concurrent which indicates that they may have occurred in any order or simultaneously. Essentially, a concurrency relation indicates that a precedence relation cannot determine which of two events happened first.
Precedence Causality""s Failure to Define Scenarios
The precedence relation does not identify when events are independent because it identifies all past events as being possible causes for the current event. This information can be useful but it is usually overwhelming and it must be analysed by hand to prune out precedence causal relationships. The context information that is most valuable for understanding the system behaviour is the scenario. A scenario is a xe2x80x9cspecific sequence of actions [events] that illustrates behaviours [for an application]. A scenario may be used to illustrate an interaction or the execution of a use case instance.xe2x80x9d1 The interaction is xe2x80x9ca specification of how stimuli are sent between [object] instances to perform a specific task. The interaction is defined in the context of a collaboration.xe2x80x9d2 
1xe2x80x9cOMG Unified Modelling Language (UML) Specificationxe2x80x9d Version 1.3, March 2000 which is the industry standard. 
2xe2x80x9cOMG Unified Modelling Language (UML) Specificationxe2x80x9d Version 1.3, March 2000 which is the industry standard. 
An observed scenario is, informally, a set of objects which execute and interact together, recording events as they execute. The observed scenario is produced by ordering of the events to identify the objects"" local interactions and their interactions with each other.
In a sequential application with only one stimulus the order of recorded events (i.e., the system behaviour) is one-to-one with the observed scenario. This is true of every static system where every execution of the application (i.e., scenario behaviour) corresponds to the exact ordering of the events in the system (i.e, system behaviour).
If there are dynamic aspects to the system structure or behaviour, then the one-to-one correspondence of scenario event ordering with the system behaviour is no longer true. The scenario structure cannot be recovered in this case because multiple scenarios are intermingled with each other in the system behaviour. The dynamic aspects involve: multiple simultaneous stimuli, concurrent thread execution, dynamic construction of software components, replication of software components, dynamic communication paths, message queuing, asynchronous message sends, etc. The following three canonical problems describe the problem of recovering and isolating observed scenario structure using precedence causality.
Canonical Problems of Precedence Causality
A fundamental limitation of the precedence causality approach is that it cannot identify scenarios because it cannot identify the end of a scenario, hereafter called xe2x80x9cthe problem of finding the scenario endxe2x80x9d. This situation is illustrated in FIG. 1a where there are two scenarios. Each scenario consists of a hidden external event causing Object A to send a message to Object B with each object doing some internal processing (not shown for clarity). As shown in the figure there are two independent scenarios initiated by Object A but there is a network delay such that the second message send of Object A (event eA2) overtakes the first message it has sent (event eA1).
The scenario causal ordering is that the events of the first scenario are Scenario1={eA1eB2} and the events of the second scenario are Scenario2={eA2eB1}. Note that each scenario is properly identified and can be analysed independently of the other (e.g., comparing the actual behaviour against the intended behaviour of a sequence diagram).
The precedence ordering of the two scenarios is shown in FIG. 1a, including the transitive ordering components. The precedence ordering includes the additional event orderings {eA1eA2}, {eA1eB1}, {eA1eB2}, and {eA1eB2}. These extra event orderings would need to be filtered out before any analysis could be performed because it is not possible to identify the scenarios. It is possible to do the filtering manually for a small example but these additional relationships grow exponentially with the number of events recorded.
A second fundamental limitation of the precedence ordering relation is that an event can only belong to one scenario but it is difficult to determine which event a scenario belongs to. Hereafter called the xe2x80x9cproblem of scenario association.xe2x80x9d This is illustrated by FIG. 1b. Is there one or two scenarios in FIG. 1b? There can be one scenario that consists of the events S1={e1, e2, e3, e4}, or the two scenarios S1, ={e1, e3} and S2={e2, e4}. This problem grows linearly with the number of interactions between objects.
A third limitation of precedence ordering is that events are recorded for a duration of time. Instead, monitoring should be triggered based on the scenario that is being executed. This is the problem of the scenario monitor trigger.
A fourth limitation of precedence ordering is that it is not communication protocol aware. The communication protocol that is used to send and receive information is important for analysis purposes but precedence causality does not capture any information related to it. This is a lack of communication protocol characterization.
A new type of causality, called scenario causality, is needed that overcomes these limitations.
Logical Clock Background
Discussions of implementation mechanics of logical clocks are presented in the following articles:
M. Ahuja, T. Carlson, A. Gahlot, and D. Shands. xe2x80x9cTimestamping events for inferring xe2x80x98Affectsxe2x80x99 relation and potential causality.xe2x80x9d In Proceedings 11th International Conference on Distributed Computing Systems (COMPSAC 91), pages 274-281, Arlington, Tex., 1991;
B. Charron-Bost. xe2x80x9cConcerning the size of logical clocks in distributed systems.xe2x80x9d Information Processing Letters, 39:11-16, July 1991;
C. Diehl and C. Jard. xe2x80x9cInterval approximations of message causality in distributed executions. xe2x80x9d In Proceedings of the Symposium on Theoretical Aspects of Computer Science, pages 363-374. Springer-Verlag, February 1992;
C. Fidge. xe2x80x9cLogical time in distributed computing systems.xe2x80x9d IEEE Computer, pages 28-33, August 1991;
J. Fowler and W. Zwaenepoel. xe2x80x9cCausal distributed breakpoints.xe2x80x9d In Proceedings of 10th International Conference on Distributed Systems, pages 134-141, 1990;
L. Lamport. xe2x80x9cTime, clocks, and the ordering of events in a distributed system.xe2x80x9d CACM, 21(7):558-565, July 1978;
F. Mattern. xe2x80x9cTime and global states of distributed systems.xe2x80x9d in Proceedings International Workshop on Parallel and Distributed Algorithms, pages 215-226, Amsterdam, 1988. Bonas, France, North-Holland;
S. Meldal, S. Sankar, and J. Vera. xe2x80x9cExploiting locality in maintaining potential causality.xe2x80x9d In Proceedings 10th Annual ACM Symposium on Principles of distributed Computing, pages 231-239, Montreal, Canada, 1991;
M. Raynal and M. Singhal. xe2x80x9cLogical time: Capturing causality in distributed systems.xe2x80x9d Computer, 29(2):49-56, February 1996;
R. Schwarz and F. Mattem. xe2x80x9cDetecting causal relationships in distributed computations: in search of the Holy Grail.xe2x80x9d Distributed Computing, 7(3):149-174, 1994;
M. Singhal and A. Kshemkalyani. xe2x80x9cAn efficient implementation of vector clocks.xe2x80x9d Information Processing Letters, 43:47-52, August 1992; and,
C. Valot. xe2x80x9cCharacterizing the accuracy of distributed timestamps.xe2x80x9d In Proceedings of the ACM IONR Workshop on Parallel and Distributed Debugging, pages 43-52, May 1993.
The implementations described in the above references have several commonalties. Each event is assigned a time stamp from a logical clock, which is used to establish relative ordering of events. If a first event precedes a second event, then the time stamp of the first event is smaller than the time stamp of the second event. To generate the time stamp, every object maintains its own local logical clock that is advanced using a set of prescribed rules. An object""s local clock represents its best approximation to a global logical clock. A time stamp is included with every message sent. A receiving object uses the included time stamp to update its local clock. Internal, send, and receive events advance an object""s local clock.
Lamport, in the above noted reference, describes a logical clock wherein each object has a scalar local clock in the form of a counter that is incremented with each event. When a message is received that has a larger time stamp than the receiving object""s current counter, the received time stamp replaces the current counter value. A total ordering of events can be constructed by appending an object""s identifier to a time stamp value. In this way, within an object a first event precedes a second event when the first event has a time stamp that is less than that of the second event. Unfortunately, between objects, it is often difficult to assess an ordering since concurrent objects have their own local counter which may increment faster or slower than that of another object.
In another logical clock implementation, each object maintains a vector of integers that constitutes its local clock. A timestamp consists of the entire vector and each message sent includes an entire vector. Precedence order of two events is determined by comparing two vector time stamps in a similar fashion to that described by Reynal and Senghal as well as Fidge et al. in the above noted article. Concurrency can be determined in both cases.
A known implementation difficulty of a vector clock is the size and overhead of the time stamp. Characterising concurrency requires using vector time stamps of integers of at least size N when nothing is known about a computation except a number of objects, N. When N is large, the amount of time stamp data associated with each message and event becomes unacceptable.
There have been several approaches to reducing the overhead associated with vector time stamps. Singhal and Kshemkalyani, in the above noted reference reduce communication bandwidth by sending vector clock entries that have changed from a message last sent to a receiver in place of an entire vector. Each object maintains two additional vectors to store information between interactions. However, communication channels must be FIFO. In this approach, post-execution analysis is needed to recover the precedence relation between different messages sent to a same receiver.
Fowler and Zwaenepoel, in an above noted reference, describe a direct-dependency technique reducing communication overhead by maintaining precedence relations for direct interactions. A transitive component of the precedence relation is constructed by post-execution analysis. This allows an object""s local clock to be an event counter. Each object maintains information relating to objects with which it directly communicates. Each message carries with it a sending object""s event counter value from when the message was sent. The information that is recorded for each communication event is a sending object, receiving object, and appropriate event counters.
Valot, in an above noted reference, suggests that there is a trade-off between memory requirements and time stamp accuracy for precedence relations. She describes a family of time stamps, which she calls k-vectors, that can be tailored for particular analysis. Instead of allocating a position in the vector to a single object, a subset of available objects are each assigned a single position in the vector. The size of the k-vector is a number of subsets chosen. The appropriate selection of vector clock subsets provides better time stamp accuracy for a given vector size. However, a priori knowledge of simultaneous concurrency during execution is required for optimal assignment of an object to a position in the k-vector. This method, therefore, is only applicable to certain cases and not to general implementation.
Other logical clocks such as those proposed by Meldal, et al. require specific conditions or additional a priori knowledge to result in a reduced size time stamp or approximate the precedence relation. Using knowledge of fixed communication links between objects, this method provides a precedence ordering between messages arriving at a same object. This approach is used to determine precedence relations between messages arriving at a same object with overhead dependent upon network topology.
Interval clocks have been disclosed to approximate the precedence relation with a constant time stamp size. Interval clocks provide better results than scalar clocks having a same overhead. By using a bit array vector value instead of a counter, precedence relations are established by post-execution analysis. If only blocking RPC style communication is used then interval clocks describe the precedence relations with no additional post-execution analysis.
All of these logical clocks and all prior research only dealt with precedence causality. A scenario based causality is needed.
Monitoring and Tracing a Process
Event records are produced by monitoring a process. There are two aspects to monitoring. There is a monitoring system comprising means for storing data relating to process execution, and monitoring instrumentation, which using the monitoring system for recording of execution related information. The term monitor is used in its general sense to incorporate both these aspects.
An event record contains information about an application""s activity and it consists of at least an event token and a time stamp. The time stamp is generated by a monitor and represents the acquisition time of the event record. The set of events is stored as an event trace.
A monitor collects information by at least one of sampling or tracing. Tracing consists of reporting all occurrences of an event within a certain interval of time. Tracing is synchronous with occurrence of events; it is performed when all occurrences of an event are known or when each occurrence of an event is followed by a certain action. With tracing, dynamic behaviour of a program is abstracted to a sequence of events. On the other hand, sampling is a collection of information upon request of the monitor. Optionally, sampling is asynchronous with the occurrence of an event; it is useful when an immediate reaction to an event is not necessary. Sampling allows only statistical statements about program behaviour. Profiling involves collecting execution counts or performing timing at the procedure, statement, or instruction level, using sampling or tracing.
Recorded information relating to events includes fields that record encapsulated data that follows a prescribed format. Some common approaches to specifying data to record are recording header data in the trace file to describe the fields; a self-describing trace format; an abstract information model based on entity-relationship descriptions; and a trace description language.
There is a large body of work in the prior art relating to monitoring of parallel programs but there is little research of monitoring distributed applications. There is an expectation in prior art literature that much of the parallel program monitoring research is applicable to a distributed application; however, it has been found that monitoring of distributed applications has a different set of requirements.
There are many different properties that a monitor may have. Several that have been identified in the literature are machine independence, using shadow processors, visualisation of performance metrics as they are gathered, pre-execution, automated instrumentation, instrumentation during execution, run-time enabling of event probes, event ordering by precision hardware time stamp, on-line program steering to control the program and monitoring overhead as it executes, and post-execution compensation for probe intrusion. Most of these monitoring systems sample and aggregate measurements using a specified criteria, and then present the resulting metrics either visually for analysis or to an expert system for evaluation.
Discussions of implementation mechanics of monitors are presented in the following articles:
P. Dauphin, R. Hoftnann, R. Klar, B. Mohr, A. Quick, M. Siegle, and F. Sotz. xe2x80x9cZM4/Simple: A general approach to performance measurement and evaluation of distributed systems.xe2x80x9d In T. Casavant and M. Singhal, editors, Readings in Distributed Computing Systems, pages 286-309. IEEE Computer Society Press, Los Alamitos, Calif., 1994.
M. Heath and J. Etheridge. xe2x80x9cVisualizing the performance of parallel programs.xe2x80x9d IEEE Software, 8(5):29-39, September 1991;
M. J. Kaelbling and D. Ogle. xe2x80x9cMinimizing monitoring costs: Choosing between tracing and sampling.xe2x80x9d 23rd International Hawaii Conference on System Sciences, Volume 1:314-320, January 1990;
B. P. Miller, M. D. Callaghan, J. M. Cargille, J. K. Hollingsworth, R. B. Irvin, K. L. Karavanic, K. Kunchithapadam, and T. Newhall. xe2x80x9cThe Paradyn parallel performance measurement tool.xe2x80x9d Computer, 28(11):37-46, November 1995;
D. M. Ogle, K. Schwan, and R. Snodgrass. xe2x80x9cApplication-dependent dynamic monitoring of distributed and parallel systems.xe2x80x9d IEEE Transactions on Parallel and Distributed Systems, 4(7):762-778, July 1993;
P. H. Worley. xe2x80x9cA new PICL trace file format.xe2x80x9d Technical Report ORNLFM-12125, Oak Ridge National Laboratory, September 1992; and,
J. Yan, S. Sanikkai, and P. Mehra. xe2x80x9cPerformance measurement, visualization and modeling of parallel and distributed programs using the AIMS toolkit.xe2x80x9d Software Practice and Experience, 25(4):429-46 1, April 1995.
Though a tremendous amount of research and effort has been expended attempting to better monitor and analyse software execution, heretofore, no system exists for determining restricted forms of causality such as scenario causality. Scenario causality is a subset of precedence relationships and is indicative of a more direct causal link. Precedence, of course, is considered a requirement for scenario causality since current understandings of time indicate that it is unlikely that a later event can cause an earlier event to occur. It is desirable to determine forms of causality other than mere precedence of an application during execution. This would require solving the previously listed problems of xe2x80x9cfinding the scenario endxe2x80x9d, xe2x80x9cscenario association,xe2x80x9d xe2x80x9cthe scenario event triggerxe2x80x9d, and xe2x80x9ccharacterization of communication protocol.xe2x80x9d In so doing, causal connections detected are likely more significant and less numerous. It is also desirable to determine precedence for a multiprocessor or network based application during execution.
It is an object of the invention to provide a method of recording information relating to some events during execution of a process, and of determining scenario causality and precedence causality for some of the events.
It is an object of the invention to provide a method of recording information relating to some events during execution of a distributed software application, and of determining scenario causality and precedence causality for some of the events.
It is an object of the invention to provide a method of recording information relating to some events during execution of a process, and of analysing the recorded information for the purpose of determining aspects of process execution flow.
In accordance with the invention there is provided for a system wherein information is recorded relating to events occurring during execution of a process, a method of determining a plurality of the events that are causally connected by precedence causality or scenario causality. The method comprises the steps of:
(a) translating the recorded information relating to the events to first graph language statements wherein one or more events is translated to a statement;
(b) determining from the statements information relating to process execution flow wherein each statement comprises information relating to a predetermined process execution flow; and,
(c) based on the information relating to a predetermined process execution flow, determining, for each of a plurality of caused events, a plurality of events from the events that precede each event from the plurality of caused events and are each scenario causally or precedence causally connected to said event from the plurality of caused events.
In accordance with the invention there is provided a method of determining a plurality of the events that are scenario causally or precedence causally connected comprising the steps of:
during execution of an event,
recording process related information,
recording object related information, and
recording event related information;
using the process related information and the object related information for a plurality of events, translating the recorded information to a graph language substantially indicative of scenario and potential causal connections between events; and,
providing information based on the causal connections between events.
In accordance with the invention there is provided a method of determining a plurality of events that are scenario causally or precedence causally connected for use with recorded information relating to events occurring during execution of a process. The method comprises the steps of:
analysing the recorded information to determine a partial order of events from each of two relative perspectives;
combining the two partial orders of events to produce information relating to some forms of scenario and potential causality. In accordance with the invention there is provided a method of determining a plurality of the events that are scenario causally or precedence causally connected comprising the steps of:
providing a process for execution;
instrumenting the process for monitoring of an execution of the process;
executing the instrumented process to produce a trace of the process execution;
transforming the trace of the process execution into a plurality of scenario graph language statements according to a plurality of predetermined rules to reverse engineer scenarios;
transforming the scenario graph language statements into a scenario event graph for analysis, and,
transforming the scenario event graph(s) into a domain specific model for analysis in another domain.