The present invention relates to diagnostic data capture in a computer environment upon determination of an invalid state. In particular it relates to a verification of the invalid state.
Multithreaded computing environments are capable of executing multiple threads of executing software at the same time. Such environments can involve one or more computer systems including multiple processors or single processors capable of executing multiple instructions contemporaneously.
Problem determination and resolution in such environments draws upon software and hardware tools to assist in diagnosis. Typically, there is the ability to record information about the flow of events through software code in the computing environment. For example, in IBM CICS products, a facility is provided known as “CICS Trace” (IBM and CICS are registered trademarks of International Business Machines Corporation in the United States, other countries, or both). It allows a chronological sequence of events to be captured as a thread of execution moves through different software programs in a computing environment.
Sometimes there is a need to capture problem determination diagnostics when a particular event (or sequence of events) has taken place, where the existing diagnostic data provided in a trace is insufficient to resolve a particular problem. Monitoring and diagnostic logic can be implemented in software to monitor the state of the computing environment and capture diagnostic data when the state is determined to indicate an unstable, ineffective or erroneous state of operation. Such states can be known as invalid states of execution. In the example of CICS, this can be achieved using a program known as “DFHTRAP” which can analyse the state of a computing environment at specific points in its execution and make decisions as to whether diagnostic data should be captured depending upon the state of the environment at those points in time.
In computing environments implemented to execute in a single-threaded manner, where only a single series of logical operations can execute one time, the monitoring and diagnostic logic which analyses the state of the environment can be certain that the state is static at the time it is being analysed. This means that if an invalid state is detected it represents an instance of failure and so it is correct to capture diagnostic data for it.
However, in multithreaded computing environments such as CICS Transaction Server with multiple open task control blocks (TCBs), there is the potential for the state of the computing environment to change while the environment is being monitored. Such changes of state can take place because threads of execution other than a monitoring and diagnostic thread continue to execute and potentially change the state of the computing environment at the same time as the monitoring operation. This can lead to the environment appearing to be in an invalid state when in fact it is the result of another thread changing the state of the environment at that same moment in time.
Operations that can be problematic in multithreaded computing environments include, for example: the addition or removal of list items to/from a linked list data structure; the updating of instance data; the incrementing or decrementing of counters (such as above or below thresholds); etc. Such operations do not ultimately result in an invalid state of the computing environment but can involve transitioning through a transient state that can be seen to be invalid if not understood in the context of the overall operation. For example, the addition of an item to a linked list data structure can, momentarily, result in a newly created list item containing uninitialized (and consequently invalid) memory references (pointers). Monitoring and diagnostic logic analysing such data could conclude an invalid state due to the invalid memory reference when in fact the state is merely transient and, when considered as part of the overall operation of adding a new linked list item, the operation will conclude with a valid list item entry with no invalid memory references. Similarly, the incrementing of a counter which causes the counter to exceed a predetermined threshold can, momentarily, result in a determination of an invalid state by monitoring and diagnostic logic since the threshold is exceeded. In fact the state is merely transient and, when considered as part of the overall operation of incrementing a counter and checking for the breach of a threshold before resetting the counter, the operation will conclude with a valid state. Accordingly, false positive determinations of invalid state by monitoring and diagnostic logic can arise in multithreaded computing environments and can result in the unnecessary collection of diagnostic data.
In a busy production environment, such as an online transaction processing environment like CICS, the capturing of unnecessary diagnostic data can result in a major degradation of performance. The obvious solution is to synchronise the computing environment during the monitoring and diagnostic operation such as by forcing the environment to suspend all threads other than the monitoring and diagnostic thread while the monitoring and diagnostic logic executes. Such synchronisation imposes unfeasible performance bottlenecks on the computing environment as all threads are suspended every time monitoring takes place, resulting in poor performance.
Accordingly, it is presently not possible to capture diagnostic data for invalid states of a computing environment without the potential to also capture diagnostic data due to valid transient changes in the environment's state, and so incur a performance degradation arising from the unnecessary collection of diagnostic data.