Tracing routines in software and microcode have been provided in the prior art for the purpose of tracing through a program to assist in locating errors or bugs therein. Software monitoring routines and instructions (e.g. S/370 monitor call instruction) also are in the prior art for the purpose of storing hardware and software states sensed during the execution of programs on a system, in order to enable analysis of the results to measure system performance and enable tuning of the resource configuration of the system.
Tracing assists in the determination of system problems by providing a snapshot record in storage of certain types of states existing when a location in a program is reached.
The instruction support for software monitoring has some similarity to tracing instruction support. However their functions are different, i.e. software diagnostics for tracing, and system performance measurement for monitoring. Software monitors universally have the problem of distorting the operations of the system they are measuring, because a monitor software routine or instruction interrupts the program it is monitoring to thereby compete with the program being measured for hardware resources in the system. This distortion is unimportant to the diagnostic function of tracing. For this reason, monitoring is most accurately done by special hardware which is normally externally connected to a system, but may be built into a system such as by the invention disclosed and claimed in a patent application Ser. No. 509,128, entitled "Internally Distributed Monitoring System" filed on the same day as the subject application and having the same assignee.
Many types of tracing programs are currently available. However, the name, tracing program, covers diverse areas in the examination of computer programs. One tracing program may be strikingly different from another, and each may be useable for a different purpose. Some tracing programs only observe a particular type of situation, event, or kind of information. Many prior tracing programs are not user-directed and are not flexible; they can not be tailored to a particular computer installation or to a particular program execution.
Some prior tracing programs were limited in their tracing function to branch points only, interrupt points only, predetermined sequences of instructions, or other prespecified events.
Some prior tracing programs operate at relatively slow rates; for example, it is common when storing all traced data for such prior programs to take, on the average, 100 times longer than their untraced execution time. Most prior tracing programs overlay their limited output areas which they may use for buffering to an output device, or they may have the results remain in main storage for analysis.
A tracing program that totally controls all other programs on a computing system is disclosed and claimed in U.S. Pat. No. 3,707,724 to J. L. Dellheim entitled "Program Execution Tracing System Improvements".
Dual address space (DAS) tracing in S/370 is described on pages 4-11 through 4-15 in the IBM S/370 Principles of Operation, Form No. GA22-7000-8. It describes a tracing architecture that uses fixed length entries written into one trace table in main storage used by all CPUs in the system.
Also special purpose trace instructions have been used as assists to MVS/370 programs on S/370 systems. Each generates a fixed length trace entry of 32 bytes in the system trace table. These special purpose trace instructions are:
TRACE SVC INTERRUPTION
TRACE PROGRAM INTERRUPTION
TRACE INITIAL SRB DISPATCH
TRACE I/O INTERRUPTION
TRACE TASK DISPATCH
TRACE SVC RETURN
TRACE SVC INTERRUPTION
The fixed length format limits tracing performance when more tracing data is needed to be collected than would fit in the fixed length format. Then the tracing needs to be supplemented by a tracing program to collect the additional data.
FIG. 7 illustrates the prior art S/370 tracing method, in which all CPU's 1 through N in an MP use the same system trace table. They all access the system trace table through an anchor word at absolute address 84 in main storage. Flag bit A in the anchor at this location indicates whether system tracing is enabled or not. If enabled, a trace-table header address in the anchor word is used to access a trace-table header in main storage, which contains the current trace address to be used, and also contains the boundaries of the system trace table in starting and ending address fields. Significant overhead occurs for each tracing entry made, in order to compare the updated current trace-entry address with the ending address and with the end of the 4KB page frame currently being used since another page frame must be allocated before the trace table can be continued beyond the current page frame.
In more detail for the prior S/370, tracing for all CPU's 1 through N is enabled by bit A being set to 1. Then traceable instructions executed by all CPU's 1 through N write fixed-length trace entries into the single system trace table in the following manner:
1. Any one or more of CPU's 1-N executing a traceable instruction will access location 84 to read the trace-table header address. This causes no contention among simultaneously requesting CPU's, because only read requests are made.
2. Each CPU which is permitted to access the trace table must change the current-entry address to the next entry value for the next CPU permitted to make a trace entry. In order to maintain the integrity of the trace table, only one CPU at a time is allowed to read the current-entry address and then change the current-entry address value. Thus, only one of simultaneously contending CPUs is permitted to access this one-word address at a time to perform an interlocked update on the current-entry address. This is usually done with a compare and swap (CS) instruction, which serializes all concurrently requesting CPUs. Each requesting CPU failing to gain access goes into a CS wait loop until it successfully executes its CS instruction, after which it reads and then changes the current-entry address. Also another type of very significant inter-CPU interference can exist when the successful CS read request occurs; and that is the requested current-entry address may be found only in a store-in cache of another CPU which last changed that address value. Then before the read request can be performed, the line containing that address must be castout of the other CPU's cache and line fetched into the requesting CPU cache, which may be done either through main storage or cache-to-cache if available in the MP.
3. After a CS successful CPU reads the current-entry address, it will be unique to that CPU because each next CPU to read the current-entry address will get a different address (incremented by the prior successful CPU). Hence there is no interlock contention among the tracing CPUs in accessing the trace table itself to record its trace data. However, there is significant inter-CPU cache contention among the CPUs, because their cache line sizes are much longer than the one-word current-entry address field.
Hence, both steps 2 and 3 have a high probability of resulting in cache interference with another CPU. The resulting MP performance degradation may be very significant.
For step 3, although each CPU is storing its trace entry into a different 32-byte location, inter-CPU interference results if the cache-line size is more than 32 bytes wherein all or part of the required 32 byte entry is part of a line (e.g. 128 bytes) in another CPUs cache. For example in an IBM 3081 CPU, the cache-line size is 128 bytes which spans across four trace table entries.
Also, two fundamental architectural problems exist with the S/370 fixed-length trace entries, which are: (1) some entries cannot contain all of the available information required, and (2) some entries waste unused space. The variable-length entries provided by the subject invention permit each trace entry to include all available required information without waste.