1. Field of the Invention
The present invention generally relates to evaluating the instruction stream in a data processing system in order to optimize the performance of the system. More specifically, the present invention is a tracing mechanism, implemented in system hardware, that allows for trace and profile characteristics of instructions executed by a central processing unit (CPU) to be generated for use by a system designer.
2. Description of Related Art
It is well known that performance projections for processors and memory subsystems are critically dependent on a correct understanding of the workloads imposed on such systems. One of the most important components of a system's workload is the instruction stream executed by the processor. In order to accurately predict the performance of proposed systems and assist in selecting among various design trade-offs, it is necessary to collect instruction streams (i.e. traces) that statistically represent actual workloads.
In order to be most useful, the trace instruction stream must include the effective operand (data) addresses as well as the effective instruction addresses that comprise the workload. This is particularly true in processor organizations involving multiple issue instruction dispatch and/or hierarchical storage subsystems. In such organizations data dependencies can severely degrade performance. It is therefore important that operand addresses be collected along with instruction addresses. If the developer knows whether the data is being accessed from a particular location in the memory subsystem, e.g. for loads and stores the data is accessed at addresses very close to one another, then various hardware performance characteristics, such as locality can be implemented. For example, locality allows the hardware system to take advantage of the fact that data is being accessed at neighboring addresses in the memory subsystem.
Conventional trace tools are generally in the form of software, or a hardware tracing device which is external to the CPU, both of which have significant drawbacks. Software tools, such as described in European Patent Application 0 501 076 A2 are very slow, because the instruction stream being traced must be preprocessed and then postprocessed to derive the actual information needed for the trace.
Those skilled in the art will understand that, due to the preprocessing and postprocessing, a software facility incurs a slow down of 30 to 1 or more and thus perturbs execution paths. Clearly a slow down of this magnitude will tend to change the apparent balance between CPU and I/O processing loads. The trace streams collected from a system in such an unbalanced state will thus be nonrepresentative and adversely affect the predictions of performance models using them.
Other conventional trace tools include external hardware tracing devices, such as shown in European Patent Application 0 525 672 A2 are also slow, since the trace device is connected to the processor by the system bus, which typically runs slower than the processor speed. Similarly, U.S. Pat. No. 4,611,281 shows an apparatus for analyzing microprocessor operations wherein a separate test system is connected to the processor being analyzed.
However, while it is possible to construct system hardware, external to the CPU that will collect instruction streams, it is not the most useful or flexible method. This is due to the cost of providing the data paths to provide the information, the external media required to capture the information and the difficulty entailed in selecting when to trace (i.e., controlling the tracing) that usually accompanies such external hardware tracing.
Further, external hardware trace facilities usually slow down the traced system. This stems from the fact that the performance of virtually all processor designs are highly dependent on hierarchical storage subsystems. Hierarchical storage subsystems tend to render the instruction/data streams inaccessible. Correcting this problem typically involves disabling or crippling some portion of the storage hierarchy (e.g. disabling the first level of caching so that the instruction/data fetches are visible). This approach will cause hardware tracing to degrade CPU performance.
Another prior art external hardware trace approach involves broadcasting a branch taken, i.e. interrupt and operand addresses in an observable manner (e.g. by "stealing" cycles from a system bus). In this way, changes in instruction stream flow can be observed. But, in multiple dispatch machine organizations, the frequency of taken branches, such as loads and stores can be high enough to consume substantial system bus bandwidth, potentially impacting performance. Aggravating this problem is the trend towards external bus rates that are a fraction of the processor internal speed.
A problem unique to external hardware tracing devices is that the required additional instrumentation tends to be nonportable and costly, typically requiring that a system be dedicated to tracing. This can be an obstacle to collecting useful traces since the code or system configuration of desirable workloads may be prohibitively difficult or expensive to install on the dedicated trace system.
U.S. Pat. No. 5,146,586 shows a tracer memory, directly connected to an instruction register, that will concurrently store instructions as they are provided to the execution unit. However, in this case, the actual address from which the data was retrieved (for load instructions) or to which the data is being stored (for store instructions) will not be known. The actual location at which the data was accessed will only be known after the instruction has executed.
It can be seen that a tracing tool that combines the simplicity of hardware and the flexibility of software to give real time results would be advantageous. Further, a tracing system that records the actual storage location where the data was accessed would also be an advantage.