1. Field
This disclosure generally relates to software development. More particularly, the disclosure relates to problem determination (“PD”) for handling software errors.
2. General Background
PD is an approach utilized by software developers to find bugs, i.e., errors, in software code. A current PD methodology utilized to find bugs in a Service Oriented Architecture (“SOA”) involves reading output generated by the SOA software. An SOA system can be composed by loosely coupled services. This output is typically provided in a trace file. Tracing is the process of utilizing one or more trace files to debug a software application.
Tracing is generally very useful in PD for a software product. In particular, tracing is helpful in resolving PD customer issues that occur at a customer site. When a problem occurs in a customer environment, software vendors typically do not have the opportunity to perform code level debugging for a variety of reasons. One reason is that there may be a communication barrier, e.g., a firewall sitting in between the application installed at the customer site and the software vendor. Another reason is budgetary in that traveling to the customer for each PD customer issue may be too expensive. Further, customer policy may prevent the vendor from performing code level debugging at the customer site. For example, the customer's data and production may be too sensitive to allow PD in its live system. Accordingly, the software vendor typically relies on gathering trace data from the customer. The trace data generally provides a record of the software product's execution logic. By reading the trace data, the software vendor attempts to determine the problem with the software product.
A set of trace data may be involved in an instance of a business software application. That business software application may potentially span multiple components and systems. For example, a first set of trace data may include all the trace records associated with a particular business transaction for a first person. Further, a second set of trace data may include all the trace records associated with a particular business transaction for a second person. Similar, if not the same, components may be utilized for both of these business transactions. Filtering through all of the trace data to find a complete set of related trace records is helpful in PD.
Java™ Specification Request (“JSR”) 47 is a Java™ standard for logging Application Programming Interfaces (“APIs”). In particular, JSR 47 provides several pieces of information, e.g., TimeStamp, ThreadID, and Logger, to assist with correlation of trace data. Utilizing TimeStamp, a determination can be made as to when a trace was logged. Further, ThreadID provides identification of the thread utilized to execute the particular trace. In addition, Logger allows for determining which component, e.g., subsystem of a product, is responsible for the trace. However, these three pieces of data are often insufficient for finding a set of related trace data associated with a specific business transaction.
In particular, when work for the related trace data is spread across multiple hardware devices, JSR 47 data is insufficient to correlate the related trace data. This insufficiency stems from the clocks on the different hardware devices being different. Accordingly, the TimeStamp for a first trace data record on one machine may be very different than a second trace data record on a different machine. Further, the ThreadIDs are likely to be different given that each device will likely assign a different ThreadID.
Further, JSR 47 data is not helpful in a situation where multiple inbound events are received over time to participate in the work. An example of an initial inbound event is a business transaction requesting an initial set of information from the customer, and an example of a subsequent inbound event is the business transaction requesting a subsequent set of information from the customer. In other words, JSR 47 may be helpful for one inbound event that initiates the work, but does not address how to correlate subsequent inbound events with the initial inbound event. The ThreadID will likely be different for a subsequent inbound event than the initial inbound event. In JSR 47, any algorithms that create a unique identifier (“ID”) for inbound events will not be provided with a trace record that contains the correlation from the current ID to the ID that is being merged into.
In addition, the insufficiency of JSR 47 is also problematic in an asynchronous environment. During each asynchronous step, a new thread can possibly be created and the work can be transferred to a different device. Therefore, the ThreadID of JSR 47 may not gather a complete set of trace data records for a business transaction in an asynchronous environment.
Accordingly, current approaches are not sufficient in correlating trace data in a SOA. When work for a business transaction is performed across multiple computers, multiple threads are utilized for the same business transaction. One thread may be created for a first portion of the business transaction on a particular system whereas a second thread may be created for a second portion of the business transaction on a different system. Current approaches are only helpful for PD in the processing of single threads for a business transaction, which occurs on the same computer. These current approaches are deficient for PD in a multithreaded business transaction spanning multiple computers.