This invention relates generally to processing within a computing environment, and more particularly to programmatic identification of errors, such as loops.
A major cause of system outages in certain types of computing systems is authorized programs getting into a processing loop as a result of a software defect manifested externally as hung jobs or system outages. Unfortunately, the types of software defects in question are difficult to identify and correct. By way of review, the following approaches exist today.
In one approach, the instruction counter (for example, program status word (PSW)) is sampled in system control blocks related to an executing unit of work over a specified time interval to suggest that a task may be looping based on high CPU and little or no I/O activity. Activity that is consuming CPU resources or waiting for CPU resources are considered suspicious symptoms and are identified by the product as such. With this approach, it is not truly known whether the program in question is looping or simply taking a long time to execute.
Explicit evidence of a looping unit of work may be identified by analysis of events in a system trace, which may also be referred to as a flight recorder trace, examined in, for example, a storage dump. This analysis is manual in nature, however, and can only be done by initiating a storage dump and then analyzing it minutes or hours later.
There are also a number of existing patents that discuss the creation and usage of memory traces to identify patterns based on similar addresses in different ranges using a variety of techniques (for, e.g., network and java profiling). For example, U.S. Pat. No. 6,347,383 discloses address space compression through loop detection and reduction where loops are detected by determining control flow based on a memory address trace, looking for address references and reducing the trace content based on those address references. U.S. Pat. No. 6,691,207 teaches implementing loop compression in a program counter. Here, an on-chip (i.e., hardware) logic analyzer receives program counter (address reference) data used to determine when software loops exist. U.S. Pat. No. 5,355,487 teaches a non-invasive trace-driven system and method for computer system profiling including creation of a “trace hook” in a periodic clock routine of an operating system kernel to drive specific trace events related to process state changes, analogous to creating new trace entries in an existing trace. U.S. Pat. No. 5,805,863 discloses a memory pattern analysis tool for use in optimizing computer program code and, again, defines its loop analysis based on trace records with at least one memory address reference. U.S. Pat. No. 5,274,811 discloses a method for quickly acquiring and using very long traces of mixed system and user memory references. The algorithm defined in this patent is based on memory access patterns.