1. Field of the Invention
This invention relates generally to any moderately complex software system. It is especially applicable to large scale, distributed systems, where it is mandatory to track down what logic was applied across multiple components in the system.
2. Background of Related Art
Distributed Emergency Call Systems in telecommunications are in general very complex computing systems. In the realm of emergency call systems in particular, un-interrupted service must be provided with each “request” being correctly processed within a well specified time interval. The latter requirement is often met by fallback logic that is invoked whenever one or more unexpected conditions occur. Typically, in Emergency Call Systems a request is a call instance placed by a person in distress. The terms “request”, “call instance” and “transaction” are used herein to refer to an atomic transaction through a complex computing system.
Distributed systems exist that employ various implementations of tracing and logging that allow an operator to trace or follow a request throughout the system, e.g., to collect statistics or to trouble-shoot a particular problem. But in many cases just the exit criteria are collected, i.e., the end-result of a request but not how the request got to that point.
Telephony systems generally use Call Detail Records (CDRs), error logs, and Simple Network Management Protocol (SNMP) traps to gather what occurred on the system. Using conventional technology an adjunct system usually has to gather all the desired data and make a best attempt at correlating and reconstructing what most likely occurred by inference of the data gathered.
The present inventors have recognized that the existing technology for logic tracing of a complex call flow in a distributed system is dependent on gathering all applicable data, in a timely manner, and dependent on best attempts at gathering and correlating available data. Unfortunately, unless complete forethought is given before a given call is made, it is usually problematic to afterwards correlate call flow data relating to that call.
FIG. 3 shows a distributed computing system consisting of an exemplary four components, and use of conventional technology to collect a transaction history.
In particular, FIG. 3 shows a distributed computing system including (by way of example only) a front end component A 302, a core processing component B 304, a helper #1 component C 306, and a helper #2 component D 308. For fault tolerance purposes, any or all of the components 302, 304, 306, 308 may be multiple components themselves, as depicted by the shadowed boxes shown in FIG. 3.
To perform a logic tracing of a call flow, using conventional technology the data is collected, post-processed and correlated to get a view of what occurred for the transaction.
Using existing technologies, call data records (CDRs) 303, error logs 307, 309 and Simple Network Management Protocol (SNMP) traps are generated at the various components 302, 304, 306, 308 using disparate technologies. In general, the various logging pieces including call data records (CDRs) 303 are collected (preferably in a common format) and then stored in an appropriate transaction datastore 314.
In the example of FIG. 3, the identifier “Transaction XYZ” is used. As depicted in the reconstructed transaction flow 312, component A 302 generates CDRs for Transaction XYZ.
In the example Component B 304 uses SNMP traps, so Component B 304 generates SNMP traps for Transaction XYZ. In many cases, SNMP traps are only generated in the case of abnormal conditions.
Two helper Components C 306, D 308, write transaction details into respective logs 307, 309 (in the given example), so Components C and D 306, 308 generate system log messages for Transaction XYZ. The location and format of each of the system logs 307, 309 is system dependent. Similar to an SNMP trap, a system log is in general only used to record abnormal conditions.
With traditional systems and call flow tracing technology an operator must know what type transaction recording method each component in the complex system uses. With such knowledge beforehand, the operator will then gain access to each of the recording method's “storage” and then determine a way to correlate a particular transaction from end-to-end and based on information obtained from appropriate storage for each component 302, 304, 306, 308. Most likely a “transaction identifier” is used to perform this task. Once all those pieces are in hand, a best attempt at identifying the flow of a given transaction may be reconstructed as depicted at 312.
However, the inventors herein recognize that the quality of this reconstruction is a direct function of the amount of (or lack of thereof) the data logged by the system's respective components. Using conventional call flow logic tracing technology, once the relevant data has been gathered, it must be analyzed to reproduce the actual call. The resulting reproduction is in many cases, at best, only an approximation of the actual call flow.
There is a need for improved complex call flow tracing in a distributed call system.