1. Technical Field
The present invention relates in general to an improved method and system for instruction trace reconstruction in a data processing system and in particular to an improved method and system for initial state determination for instruction trace reconstruction. Still more particularly, the present invention relates to an improved method and system for efficiently determining an initial state of cache memories within a processor system upon initiation of an instruction trace.
2. Description of the Related Art
Data processing systems in recent years have been improved on a continuous and highly accelerated basis. Whether the improvements to such data processing systems are software related or hardware related, it is important for a developer to have some expectation of the performance of a new system under construction before that system is actually completely developed. In order to make this evaluation of an expected new system or changes to a current system, various techniques are known. Many different approaches have been utilized within the computer industry in order to understand and characterize those parameters which can be utilized to predict the value associated with a proposed set of changes. One particular approach is a full system software simulation of the entire system, including all of the system devices and the system software, accesses to data stored within a direct access storage device. A complete system simulation approach requires an extremely significant investment in software and has the drawback that the time required to run such a simulation is extremely long. Another approach utilized by developers is an effort to develop accurate, "representative instruction traces" which permit the use of a simplified system model in order to predict the performance of the new system.
Performance projections for processors and memory subsystems are critically dependent upon a correct understanding of the workloads which are imposed on such systems. In order to accurately predict the performance of a proposed system and assist in selecting among the various designed trade-offs, it is necessary to collect instruction streams (i.e., "traces") that statistically represent actual workloads. By utilizing traces which represent a fixed workload as input to a system model that allows variations on some hardware parameter, such as the number of processors, developers hope to be able to predict performance for that workload with a different number of processors.
One known software approach to developing an instruction and address trace is the so-called "single-step" mechanism, where a single step interrupt handler is executed immediately before or after an instruction is executed. The interrupt handler may then decode the instruction and write the pertinent information regarding that instruction to a trace buffer. The trace buffer may be provided within system memory or may be in a special hardware buffer. The hardware buffer approach is often implemented by having the interrupt handler write the relevant information at a specific address on the processor bus which is then captured by a bus monitor looking for data at that address.
Another known variation is the execution of software in a simulation model. The simulation mode works well on so-called RISC systems such as the RISC System/6000 machine running AIX or other suitable software for constructing application traces. Such an approach does have several drawbacks if it is utilized in an attempt to capture kernel traces as well. In a typical implementation that supports the capturing of kernel traces, the code is updated or "instrumented" to provide relevant information as part of the tracing process. When software approaches like these are utilized that include kernel activities, it is very important to provide some type of compensation to reflect the fact that the system timings have been perturbed. For example, there may be a much larger number of timer ticks executed than would normally be utilized and thus, the ratio of external interrupts to code being executed is similarly affected. Compensating for such timing changes may be fairly reasonable for benchmarks that are not utilizing many kernel services and/or external interrupts.
Instruction and address traces which are constructed via software instrumentation techniques can be very invasive and often severely affect the system under test. Traces produced in this manner are very time consuming, but they provide information required for fairly simple application (problem state) intensive benchmarks, where the kernel accesses are negligible. However, traces developed under conditions where the software is instrumented, are not typically considered suitably representative to characterize extremely dynamic work loads which access kernel services, such as those found in On-Line Transaction Processing (OLTP) work loads. Full system simulation approaches avoid these problems, but require an extreme investment in both software and in the time the simulation requires to run.
One technique for providing traces utilizes the processor to externalize information about what is going on inside the processor via signals or pins which can be monitored from outside the processor. A simple instruction trace can be externalized in a very straightforward manner by simply putting out the actual instruction being executed on every processor cycle. An operand address trace can be externalized by putting out the operand address on such pins. By understanding the content of the processor's internal buffers, encoded information may then be utilized to identify the operand addresses. That is, for example, signals can be utilized to identify a hit or a miss in the processor's cache or translation lookaside buffer (TLB). In case of a hit, encoded information, such as an index into the internal buffer, can be utilized to capture and construct virtual address traces. In the event of a miss, more cycles are available to give the actual address (either real or virtual) of the operand of the instructions. Both of these approaches have the drawback of requiring many pins and, as a practical matter, may be difficult to support at full speed. The processor support required may be difficult to implement due to out-of-order execution and superscalar designs with multiple instructions being dispatched and completing on a single cycle. Capturing of the data is also difficult due to the increasing speeds of modern processors. In order to actually support this approach, the speed of the processor and/or the system may have to be reduced and the processor may have to run in a single instruction issue mode.
Thus, while it is well known that a representative instruction trace may be provided for a system under test by attempting the reconstruction of an actual instruction sequence utilizing instruction and address data monitored on the system bus, the problem of determining the initial state of the system upon initiation of such an instruction trace is non-trivial. One technique utilizes the software to force the initial state of the processor to a known state by copying or invalidating the content of the registers, buffers and caches associated with the processor and then monitoring the bus as the registers, buffers and caches are refilled. While this technique permits the initial state of a system under test to be accurately determined, the size of the registers and busses utilized in modern systems results in a substantial delay while initial state is being determined. Further, the flushing of a cache and the refilling of that cache during instruction processing can also adversely affect the system under test.
Consequently, those having ordinary skill in the art will appreciate that a need exits for an improved method and system for determining the initial architected state of a processor during an instruction trace.