1. Field of the Invention
This invention relates to instrumentation and monitoring in digital computers. More specifically, this invention relates to instrumentation and monitoring of processors that can execute instructions in an out-of-sequence fashion.
2. Related Art
Instrumentation in large processors conventionally includes the collection of information associated with an executed CPU instruction stream. The data collected is used to identify the significant instruction stream bottlenecks so that the program data structures or the instruction stream itself can be tuned to the cache and machine structures. Instrumentation data can be used, for example, to identify and fix performance problems in computer operating systems.
An example of processor instrumentation (also conventionally referred to as a monitor or monitoring system) can be seen in U.S. Pat. No. 4,590,550 to Eilert et al (the Eilert patent) which is assigned to the same assignee as the present invention. The Eilert patent discloses an internally distributed hardware/software monitor for a data processing system. The monitor of the Eilert patent collects hardware signals in a plurality of instrumentation table units (ITUs) distributed within various hardware entities in the system. The collected hardware signals are related to software controlled trace entries made in a trace table. The monitor of the Eilert patent uses a time sampling method whereby machine signals are recorded in synchronism with a time driven (periodic) sampling pulse.
The time driven sampling pulse of the Eilert patent is well suited for the monitoring of machine signals which occur frequently or periodically. Some machine signals, however, are not frequent or periodic. This is particularly true of machine signals that are indicative of system events. System events can occur at infrequently and at irregular intervals of time. Since the periodic sampling pulse of the Eilert patent may not occur during the event of interest, occurrences of the event can be missed and/or superfluous data can be recorded.
An improvement to the monitor of the Eilert patent is disclosed in U.S. Pat. No. 4,821,178 to Levin et al. (the Levin patent) which is assigned to the same assignee as the present invention. In the monitoring system of the Levin patent, event driven sampling is provided as an alternative instrumentation mode for operation within the general ITU structure disclosed in the Eilert patent. Event driven sampling provides a sampling pulse only when a selected event occurs. The event driven sampling of the Levin patent enables the monitoring of machine signals based on irregularly occurring events.
While the instrumentation units of the Levin and Eilert patents are well suited to the task of monitoring most conventional CPUs, monitoring the execution of instructions in out-of-sequence processors is problematic. In the instrumentation of the Eilert and Levin patents, monitored machine signals can be directly read out of hardware latches and placed into an instrumentation array. In processors that execute instructions sequentially, this technique is appropriate since a natural correspondence (related to time of execution for example) can be maintained between the data stored in the array and a completed instruction of interest. In processors where instructions are executed out-of-sequence, the correspondence between completed instructions and generated machine signals is more difficult to ascertain.
The problems associated with the monitoring of machine signals in out-of-sequence CPUs will be more apparent through a brief overview of conventional out-of-sequence instruction processing. In out of sequence instruction processing, the machine (i.e., the CPU or Central Processor) decodes each of a series of instructions in pipelined fashion, then starts executing them. Often, a succeeding instruction will be executed before a preceding instruction. As each instruction finishes execution it goes into a queue where it is completed in sequence even though it may have been executed out-of-sequence in the machine. Since the machine is fetching, decoding and executing instructions prior to completing the instruction stream for the previous instructions, some of the fetched, decoded, and executed instructions may be thrown away due to the previous instructions which completed, or due to interrupts in the instruction stream. Also in the machine at completion time, the information about what type of instruction was executed has been written over.
Out-of-sequence processing presents a problem to instrumentation users because conventional instrumentation is typically not provided with the means to maintain a correspondence between completed instructions and generated machine signals in such an environment. Instrumentation users are generally interested in machine signals associated with a completed instruction. In out-of-sequence processing, however, a number of instructions typically do not complete even though their execution generates machine signals and may generate system events. Thus, conventional instrumentation may collect a significant amount of data related to instructions which never complete. Further, since the CPU does not maintain a natural correspondence between completed instructions and the machine signals that they generate, merely capturing monitored signals during execution will not provide an instrumentation user with sufficient information to make many significant performance judgements.
The out-of-sequence processing environment will be better understood by reference to FIG. 1. In the machine of FIG. 1, instructions forming a computer program are stored in a system memory 102. In order to accomplish execution, each instruction is fetched (at block 104), in logical order, in accord with a memory address provided by the CPU. By "logical order" it is meant that the instructions are fetched, from memory, in the order in which the programmer intended them to complete.
After an instruction is fetched, it is decoded (at block 106) based on an Op Code embedded in the instruction. The Op Code identifies what type of instruction has been encountered (e.g. branch, load register, store, etc.). Once decoded, the Op Code information is no longer needed by the CPU in Op Code format. Machine level commands are generated from the Op Code, and the Op Code information is written over by the next instruction. Only a machine level set of instructions remains. The CPU does not maintain correspondence between the executed machine level instructions and the original Op Code.
At decode time, a sequential instruction identity number (IID) is assigned to each instruction (at block 106). The IIDs are assigned on a rotating basis. In other words, the series of IIDs assigned will run, for example, from 1 through 32. The first instruction fetched is assigned IID 1. The next instruction fetched is assigned IID 2. The third instruction fetched is assigned IID 3. The 32nd instruction fetched is assigned IID 32. The 33rd instruction fetched is assigned IID 1 again, and so on.
After being assigned (i.e. tagged with) an IID the instruction is executed (at block 112). Each instruction is sent to an execution element. There are multiple execution elements 114-122 in each CPU. The execution elements operate in parallel, each processing instructions independently of the other. Instructions waiting to execute are queued up in an execution element queue until an execution element completes execution of the previous instruction.
Different instructions will often take a different number of machine cycles to execute. As a consequence of differences in execution time, the execution elements often finish execution of the instructions in an order other than that in which they were fetched. This is referred to as out-of-sequence execution.
The fact that an instruction has finished execution does not ensure that the results of its execution will be valid. For example, a branch-on-condition instruction could be fetched before a store-in-register instruction which followed in memory 102. The store-in-register would be sent to a first execution element (e.g. block 114), while the branch-on-condition would be sent to a second execution element (e.g. block 116).
In the above example, the store-in-register would finish execution before the branch. The CPU, however, would not yet have determined if the branch conditions were met because the branch would still be in the process of being executed. If the branch was actually taken, the store-in-register results would never be used (i.e. they would be invalid) because the program counter would jump to another part of the program as a result of the branch. Thus, the results of the store-in-register would be invalid.
The point at which it is determined that the results of execution are valid is referred to as "completion". The "completion" or "non-completion" of instructions is determined by the completion logic 126. As each instruction finishes execution, the results are stored in store buffer 124. As each execution element finishes execution of an instruction it informs the completion logic 126. The completion logic 126 keeps track of the last instruction to complete and the subsequently fetched instructions which have finished execution but have not completed. When it is determined that the execution will, in fact, be valid the completion logic indicates that the instruction has completed by asserting an IID N complete signal (where N is and IID number).
In the above example, upon being informed by an execution element that the branch was taken, the completion logic 126 would mark the store buffer locations holding the results of the store-in-register, invalid and the processing of the instruction stream would continue. If the branch were not taken in the above example, the completion logic 126 would update the memory 102 or CPU internal registers with the content of the store buffer 124 for the completed instruction and signal other processing elements to indicate that the instruction had completed.
Out-of-sequence instruction execution presents problems to instrumentation users. Instrumentation users are interested in the completed instruction stream. Due to the out of sequence processing, however, the information that they need (e.g. the original Op Code, cache miss status, and other system event data) is often no longer resident in the machine by the time an instruction of interest completes. Further, since conventional instrumentation is typically not suited to maintain a correspondence between a completed instruction and the machine signals it generates, the user is left without the ability to tie cache misses and other system events to the completed instruction that caused them.