Frequently, in the development of computer programs, there is a need to halt execution of the program in order to allow user intervention. Commonly, this is useful when debugging the program for errors. It may also be useful for performance analysis, turning on of tracing applications or profiling.
One example of a program in connection with which such operations are used is CICS Transaction Server for z/OS™ (CICS TS), available from IBM Corporation (“CICS” and “z/OS” are trademarks or registered trademarks of International Business Machines Corporation), but techniques generally apply to all multi-processing environments, with special emphasis on distributed systems. A multi-processing environment is one in which multiple activities occur on behalf of many users. This environment is typically found in computers that act as Servers. CICS Transaction Server is one example of such a multi-processing environment which supports application programs, written by users of CICS, which typically include newly written or pre-supplied CICS transactions. The current discussion is concerned solely with the debugging of these user written application programs.
The term distributed refers to a collection of computers which are all linked together to form a distinct unit. In the case of CICS TS, multiple instances of CICS can run in multiple CICS TS regions executing within a mainframe running the z/OS operating system to form a distributed system. Several linked mainframe computers each running the IBM z/OS operating system can also participate in a distributed arrangement called a sysplex (also referred to in the case of CICS as a CICSPlex) in which the various components communicate through a coupling facility.
The term debugging means the act of stopping the execution of a computer program when a given set of circumstances occur, together with the investigation of the execution environment for that program when stopped.
The places where execution is halted are called breakpoints. Breakpoints can, for example, cause execution to stop:                at a specific place in execution (for example: at instruction 45 in program XX);        whenever a generic event occurs (for example: variable Y contains 57);        whenever a specific event occurs (for example: program YY is executed).        
When a breakpoint is reached, program execution is halted and the user has the opportunity to examine the execution environment (for example: by inspecting the current settings of variables) and change it (for example: by altering the contents of a variable) before permitting execution to continue. This is called amending the State of the program execution instance. Execution continues until another breakpoint is encountered or execution terminates. Whilst execution of the program is halted at a breakpoint, the plurality of breakpoints can be manipulated (for example: by adding a new breakpoint) which will alter the subsequent execution of the program (for example: by halting at additional locations).
When using debugging techniques in a multi-processing environment, the plurality of breakpoints applies not merely to a single user acting on a single program, but to all users running all programs. So, for example, any user can encounter a breakpoint which stops execution at, say, instruction 42 in program XX, because access to program XX is available to all users in the multi-processing environment. In turn, this means the plurality of breakpoints have to be available to all activities in the multi-processing environment. Consequently, the debugging information (which includes the breakpoint information) must be held in a repository which is shared and accessible to all activities within the multi-processing environment. This repository can be a simple file, an indexed file or a database: the crucial thing is that it is shared between all activities.
Breakpoints fall into several general types within the CICS TS environment:                a breakpoint can be very specific: ‘Stop at instruction 42 in program PROGA when executed by a transaction called PEOH for user RAH’;        alternatively, it can be generically specified: ‘Stop at instruction 42 in program PROGA when executed by a transaction called PEO* for any user’;        or it can have wide applicability: ‘Stop at instruction 42 in program PROGA when executed by any transaction for any user’;        a generic definition uses wildcards to specific items. PEO* means apply to any item whose 1st three characters start with PEO and the 4th (and last) character can be anything (as indicated by the *)        
In the case of CICS TS, a separate program, known as the IBM Debug Tool, runs simultaneously when given CICS TS regions are in a “Debug On” state. The general interaction of this debugging tool with CICS TS is illustrated in FIG. 1.
With reference to FIG. 1 consider the relationship between the debugging tool (110) and the item undergoing analysis. In the illustrated environment of CICS TS, the item being debugged is a program (150) which is being executed under the ambit of a CICS transaction instance (140).
When the CICS transaction (140) is not being debugged:
                the user initiates the transaction (140) from a terminal (130);        input (161) is sent to the program        execution proceeds (163) into the relevant program (150) which executes its instructions;        The results (164) are returned to the user (130).        
However, when the CICS region (101) is enabled for debugging, the transaction (161) has an additional processing step (162) which determines whether or not the transaction instance itself (140) is to be debugged. If it is not, then the detection step (162) does not alter the aforementioned logical flow.
The debugging logic (110) consists of several logical (but not necessarily physical) components:                an interface (111) with the user which controls the debugging activity;        an instruction stepper (114) which physically executes a program being debugged;        a breakpoint manager (112) which determines the breakpoints of interest to the instruction stepper (114);        some control logic (113) for the environment—which in this preferred implementation is CICS TS.        
Pieces of information relevant to the operation of the debugger (110) are held externally in a repository (120). This repository may be physically implemented in a number of sub-components, but these are logically managed in one group.
If the Transaction is to be debugged, the additional processing step (162) returns a result which alters the execution of the program (150). Instead of execution proceeding directly (163), each instruction of the program (150) is executed under the control of the debugging logic (110). In particular, each instruction of the program (150) is executed within the ambit of the debugger (110). Consequently, each ‘real’ instruction in the program (150) is physically preceded by an instruction execution logical flow (171) and the corresponding result (172) after the execution of that instruction. These flows (171,172) are associated with the debugging logic (110) and in particular with the instruction stepper sub-component (114).
In effect, the user does not send a single input (161) and receive a single output (164) when the transaction (140) is being debugged. Additional flows (173, 174) to the user are presented according to the debugging logic. In particular, (173) results from a breakpoint halting execution. In the time period which occurs between flows (173) and (174)) the user (130) can inspect the state of the executing transaction and generally modify (131) the breakpoint information. After these actions, the user will continue execution (174) of the program (150). These interactions continue until the end of the program (150) is reached.
This present invention is concerned with novel processing during the period between flows 173 and 174 The detailed functioning of the IBM Debug Tool plays no part in the invention but further details can be found in a publication “Introduction to the IBM Problem Determination Tools” (Ref SG246296) available from IBM Corporation.
Particular problems associated with multi-processing aspects of debugging are illustrated diagrammatically in FIG. 2. In FIG. 2, (210) is the multi-processing environment, such as CICS Transaction Server for z/OS, which is running multiple activities (221,222,223). Each of these activities is initiated and owned by a specific user (231,232,233). Activity 1 (221) and activity 3 (223) are both executing program PROGA (251). In CICS TS terms, these activities (221,222,223) are instances of CICS transactions. Consequently, the act of debugging program PROGA (251) will affect the execution of both transactions 1 (221) and 3 (223) when a breakpoint (such as ‘Halt on instruction 56’) is encountered.
However, the transaction instance represented by activity 2 (222) is not being debugged, and so has no interest in the breakpoints (it does not access program PROGA (251)).
The plurality of breakpoints themselves (270) is located in a shared repository (260) which is accessible by all activities (221,222,223). For example, a breakpoint entry (271) could control the execution of program PROGA (251) by stopping execution whenever instruction 56 is encountered. Of course, if instruction 56 is not met (for example: a branch in program code avoids reaching instruction 56), execution will not halt.
In a multi-processing environment, the repository (260) has to be frequently accessed and the contents (270) read in order to garner the breakpoints (271) which determine whether or not execution is to be halted. This involves a considerable quantity of processing and has the major drawback that physically reading the repository (260) takes a long elapsed time compared to that spent actually executing the program. This performance impact is unacceptable. A more sophisticated (quicker) solution is required to satisfy response time criteria for the user.
The great majority of prior-art simply ignores this performance problem: users have to suffer greater elapsed time leading to discontent and dissatisfaction.
Some prior-art solutions involve maintaining local copies of the repository for each activity so that, at best, the performance penalty only occurs whilst the copy is being taken
Both of these solutions are unacceptable. In the latter case, local copies have to be kept up to date with the accurate representation held in the repository. This technique is often called caching. This caching implies that processing has to observe when the repository (260) gets changed (entries (270) can be added, deleted or altered) and then has to send a notification to all activities (221,222,223) that the repository (260) has changed, so causing these activities to take actions to update their local copies. This has, obviously, the same performance overhead as for the initial copy, and so is unacceptable.
There is an additional performance overhead in implementing the change notification communication from the repository (260) to the activities (221,222,223). This functionality requires the existence of a Repository Manger to manage these notifications. This mechanism is often called Publish and Subscribe in prior art and the activity is known as Push technology. The additional processing, and additional functionality, associated with the provision of a Repository Manager is not required by this present invention which embodies a more sophisticated, more elegant, and more efficient approach.
As mentioned above, in considering performance issues associated with debugging, the software (and hardware) involved in actually executing debugging operations will not be considered in detail. The primary area of focus is rather with the setting and control of breakpoints, as performance issues in the control of the breakpoints are a major concern. The act of controlling breakpoints in a wide network of computers (a sysplex) has significant implications on the activity of items not being debugged.
In itself, prior art mostly ignores performance issues for items physically being debugged. These issues are largely concerned with:                setting up the debugging environment;        controlling the debugging environment;        debugging activities on executing items.        
The setting up and controlling of the debugging environment are activities that have a system-level scope thus affecting all activities, not just those eligible for debugging activity.
The time spent debugging an execution instance is long compared with the actual time of program execution as the human activities involved in the debugging operation comprise most of the elapsed time spent during debugging. Prior art techniques ignore the performance and elapsed time issues involved in setting up and controlling the debugging environment.
Because prior art techniques suffer these penalties only in a localised environment, the effects are limited and perceived by humans as part of the penalty of doing debugging. In the sysplex environment, the performance penalty has a hugely wider scope and so becomes unacceptable in a large scale environment.
The present invention addresses these problems and aims to provide novel techniques for limiting these performance issues in a large scale environment and for minimising the impact of debugging or similar operations upon activities that are not undergoing debugging activity.