1. Field of the Invention
This invention relates to trace data and more particularly relates to retrieving trace data based on a query expression.
2. Description of the Related Art
Computer software generally includes a trace feature that may be used during development or during normal operation of a software application. The trace feature causes the software application to report various types of information regarding the inputs received, outputs generated, functions called, return codes received, and other highly detailed information known herein as trace data. Generally, trace data is analyzed by software engineers or programmers to facilitate resolving software bugs and/or inefficiencies in the software application. Typically, trace data can be produced at various levels of granularity. The lower the level of granularity, the more clues the trace data provides for tracking down software errors.
However, a low level of granularity also produces very large quantities of trace data. In certain software, trace data may be produced for each line of code executed. For each software event traced, a trace entry is typically generated. The trace entry is typically relatively small and provides information about the operation being performed as well as context information such as inputs, outputs, and other state information.
Trace data is typically stored for subsequent analysis after the software application is executed to generate the software error. Because trace data is generally only collected during high workload periods for the computer system and/or software application, it is desirable that the tracing operation add minimal overhead to the workload. Consequently, the frequently-generated trace entries are typically combined into larger groups of trace entries, known herein as trace records. The trace records often include a header that identifies the number of trace entries contained therein as well as other context information such as trace type and a timestamp. Trace records can be over one hundred times larger than individual trace entries. Storing the larger trace records requires less I/O than storing individual trace entries.
Trace data can be collected during a single execution or over a period of time in order to identify more latent software bugs. Consequently, the size of the trace data grows dramatically. Analyzing such high quantities of trace data has been difficult for programmers, in particular, where the trace data is formatted and presented in a text format for values such as hexadecimal. The trace data can include few, if any, queues for a programmer such as keywords. With the complexities of modem software and the high quantities of trace data, the debugging task becomes the proverbial search for a needle in a haystack.
Storing trace records optimizes writing to the storage devices, but makes reviewing and analysis extremely difficult. In particular, search utilities currently available such as DFSERA10 and DFSERA70 provided with the Information Management System (IMS) from IBM of Armonk, N.Y., do not permit searching for a data value within trace entries individually. Instead, the whole trace record is treated as a continuous, unstructured record. These conventional tools search trace records for any occurrence of the search string or data value. Consequently, conventional search tools find matching data values, also known as “hits,” at various locations within a trace record. Unfortunately, these hits cross boundaries between trace entries, boundaries within trace entries, or occur at the wrong location within a trace entry such that the hits are coincidental and of no use to the programmer. Such hits are false positives.
False positives are particularly problematic in trace systems where the trace data is stored without a large alphabet and/or complex grammar. Consequently, false positives occur more frequently in trace data comprised of, for example, hexadecimal characters versus trace data having alphanumeric characters and/or words.
In addition, conventional search tools retrieve and present each trace record that includes at least one hit. Typically, this means that a high number of non-matching trace entries, as many as one-hundred and twenty-two, or more are presented with the one or two trace entries containing the hit. Storing, printing, displaying, and sifting through the non-matching trace entries together with the actual hit trace entries can be tedious and labor intensive for programmers concentrating on tracking down a software problem. The non-matching trace entries make the results difficult to read and can interfere with a programmer's concentration. Furthermore, if the hit is a false positive, the processing of these trace records is wasted. In some instances, millions of lines of output are returned, the majority of which are extraneous.
From the foregoing discussion, it should be apparent that a need exists for an apparatus, system, and method for searching trace data on the trace entry and trace sub-entry level in addition to the trace record level. The apparatus, system, and method should minimize false positives and the size of search results to ease storage requirements and trace data analysis time.