1. Technical Field of the Invention
The present invention relates to state driven machines and, in particular, to monitoring state driven machine operation for purposes of trouble-shooting faults, logging events, and post processing of event data for other system operation and maintenance purposes.
2. Description of Related Art
The one or more processes implemented by a system are often designed and represented in a manner well known to those skilled in the art as state driven machines. In this context, a given process may implicate a number of potential states. At each state, the process performs a certain action(s), test(s) or the like, with the result or results therefrom dictating the next state to which the process should transition. There may be more than one transition path into a state, as well as more than one transition path out of a state. Thus, for example, a state "A" may test whether a certain variable matches a predetermined value or a certain signal (or an external action) is received. If yes, the process transitions to state "B". Otherwise, the process transitions to state "C". Alternatively, no transition may occur. In each of state "B" or state "C", another action, test, or the like, occurs resulting perhaps in another transition to yet another state (which may also include a transition back to state "A").
As these state driven machines become more and more complex, additional instances of faults arise. In some instances, these faults result in the hanging of the process. A hanging scenario refers to a fault where the process enters a certain state and conditions requisite for executing within the state or exiting from the state are never met. For example, the process may enter a state and wait for a certain action that will never arrive in a process situation where that action is needed in order to perform a certain task or move on to another state. If blocks (comprising software units) for the process are state driven, a hanging scenario further refers to a fault where the process enters a certain block in a certain state and the conditions requisite for exiting from the block or its state are never met. When a hanging or other fault occurs, some recovery with continued operation may be possible, but it often becomes necessary to simply restart, reset or reinitialize the system.
Once an occurrence of a fault arises and is recognized, it becomes vitally important to the system operator that the cause of the fault be rapidly discovered and corrected. Sometimes the cause of the fault may be easily discovered in the process of the current state (or block) where the fault occurred. Other times, the fault may be caused at a certain state entered into or at a transition path selected at some point in the distant past (and often times in a completely different block or a completely different process). The cause of faults in such cases is not so easy to ascertain. This is because there may be hundred of possible nodes, and hundreds of possible paths, and associated actions, through which the process passed before the fault manifests itself. All pertinent possible combinations of states and paths and actions must then be examined in order to determine which is the cause. Unfortunately, current technology provides only a snap-shot view (picture) of state driven machine status at the time the fault arises, and this snap-shot view may not provide sufficient information concerning the history of process execution to enable the cause of the fault to be easily determined.
There is a need then for a mechanism to assist in the detection of the cause of a fault in a state driven machine. Preferably, this mechanism should provide sufficient historical information relating to process execution to enable the cause of the fault to be found. The procedure implementing the mechanism should further be capable of running in parallel with normal system operation or execution. Additionally, any captured historical information should be capable of being accessed or retrieved without stopping or staying normal execution of the process.