Computer-based systems enable a wide variety of data processing tasks to be accomplished in a fast and efficient manner. From hand-held consumer products to geographically distributed storage area networks with multi-device data storage arrays, such systems continue to proliferate into all areas of society and commerce.
Software is provided to direct the operation of such systems. Software (including firmware) can take a number of forms such as application programs, operating systems, interface and controller routines, and maintenance and housekeeping modules.
Each time a process is initiated by the software, a number of additional processes, handshakes, links, calculations, and other events can be carried out by the various layers of software in order to service and complete the service request. Generally, with increased complexity in the overall software system, including additional layers of software applications and operating systems, distributed processing and fault tolerant redundancy, it becomes increasingly difficult to assess the root cause of errors and the extent to which undocumented errors occur while carrying out a given process. A high occurrence rate of a particular error or cascading error events can quickly overwhelm the system's ability to effectively track errors. Also, an unexpected error can easily go unrecognized and hence escape effective root cause resolution.
In some solutions these resultant events are logged for purposes of analysis in the event an execution error is identified. However, typically the error is identified at some time after the execution step causing the error, making it painstakingly difficult, if not impossible, to trace back through the logged events to ascertain a root cause. What is needed are improved solutions providing a real time analysis of system execution errors. It is to these benefits and advantages that the embodiments of the present invention are directed.