1. Field of the Invention
The present invention generally relates to computer systems, and more specifically to a method of simulating microprocessor operation for verification purposes, particularly operation of the instruction fetch unit within a microprocessor.
2. Description of the Related Art
Microprocessors are used for a wide variety of electronics applications. High-performance computer systems typically use multiple microprocessors to carry out the various program instructions embodied in computer programs such as software applications and operating systems. A conventional microprocessor design is illustrated in FIG. 1. Processor 10 is generally a single integrated circuit superscalar microprocessor, and includes various execution units, registers, buffers, memories, and other functional units which are all formed by integrated circuitry. Processor 10 operates according to reduced instruction set computing (RISC) techniques, and is coupled to a system or fabric bus 12 via a bus interface unit (BIU) 14 within processor 10. BIU 14 controls the transfer of information between processor 10 and other devices coupled to system bus 12, such as a main memory or a second-level (L2) cache memory, by participating in bus arbitration. Processor 10, system bus 12, and the other devices coupled to system bus 12 together form a host data processing system.
BIU 14 is connected to an instruction cache 16 and to a data cache 18 within processor 10. High-speed caches, such as those within instruction cache 16 and data cache 18, enable processor 40 to achieve relatively fast access time to a subset of data or instructions previously transferred from main memory to the caches, thus improving the speed of operation of the host data processing system. Instruction cache 16 is further coupled to a fetcher 20 which fetches instructions for execution from instruction cache 16 during each cycle. Fetcher 20 temporarily stores sequential instructions within an instruction queue 21 for execution by other execution circuitry within processor 10. From the instruction queue 21, instructions pass sequentially through the decode unit 22 where they are translated into simpler operational codes (iops) and numerous control signals used by the downstream units. After being decoded, instructions are processed by the dispatch unit 23, which gathers them into groups suitable for simultaneous processing and dispatches them to the issue unit 42. Instruction cache 16, fetcher 20, instruction queue 21, decode unit 22 and dispatch unit 23 are collectively referred to as an instruction fetch unit 24.
The execution circuitry of processor 10 has multiple execution units for executing sequential instructions, including one or more fixed-point units (FXUs) 26, load-store units (LSUs) 28, floating-point units (FPUs) 30, and branch processing units (BPUs) 32. These execution units 26, 28, 30, and 32 execute one or more instructions of a particular type of sequential instructions during each processor cycle. For example, FXU 26 performs fixed-point mathematical and logical operations such as addition, subtraction, shifts, rotates, and XORing, utilizing source operands received from specified general purpose registers (GPRs) or GPR rename buffers. Following the execution of a fixed-point instruction, FXUs 26 output the data results of the instruction to the GPR rename buffers, which provide temporary storage for the operand data until the instruction is completed by transferring the result data from the GPR rename buffers to one or more of the GPRs. FPUs 30 perform single and double-precision floating-point arithmetic and logical operations, such as floating-point multiplication and division, on source operands received from floating-point registers (FPRs) or FPR rename buffers. FPU 30 outputs data resulting from the execution of floating-point instructions to selected FPR rename buffers, which temporarily store the result data until the instructions are completed by transferring the result data from the FPR rename buffers to selected FPRs. LSUs 28 execute floating-point and fixed-point instructions which either load data from memory (i.e., either the data cache within data cache 18 or main memory) into selected GPRs or FPRs, or which store data from a selected one of the GPRs, GPR rename buffers, FPRs, or FPR rename buffers to system memory. BPUs 32 perform condition code manipulation instructions and branch instructions.
Processor 10 employs both pipelining and out-of-order execution of instructions to further improve the performance of its superscalar architecture, but may alternatively use in-order program execution. For out-of-order processing, instructions can be executed by FXUs 26, LSUs 28, FPUs 30, and BPUs 32 in any order as long as data dependencies are observed. In addition, instructions are processed by each of the FXUs 26, LSUs 28, FPUs 30, and BPUs 32 at a sequence of pipeline stages, in particular, five distinct pipeline stages: fetch, decode/dispatch, execute, finish, and completion.
During the fetch stage, fetcher 20 retrieves one or more instructions associated with one or more memory addresses from instruction cache. Sequential instructions fetched from instruction cache 16 are stored by fetcher 20 within instruction queue 21. The instructions are processed by the decode unit 22 and formed into groups by the dispatch unit 23. Issue unit 42 then issues one or more instructions to execution units 26, 28, 30, and 32. Upon dispatch, instructions are also stored within the multiple-slot completion buffer of a completion unit 44 to await completion. Processor 10 tracks the program order of the dispatched instructions during out-of-order execution utilizing unique instruction identifiers.
It can be seen from the foregoing description that the flow of instructions through a state-of-the-art microprocessor is particularly complicated, and timing is critical. It is accordingly incumbent upon the designer to be able to verify proper operation of a new microprocessor design, especially the instruction fetch unit (IFU) 24. Functional verification of IFUs is conventionally accomplished by running computer simulations in which program instructions are fetched from other devices outside of the simulated processor, or from the internal caches within the IFU model, and delivered to the other portions of the simulated processor for execution. The instructions fetched may be part of a special software program written for testing purposes, or may be generated by the verification environment; see, e.g., U.S. Pat. No. 6,212,493.
With specific regard to functional verification of the IFU, there is a different focus compared to the other components of the processor. For significant portions of the IFU, the actual instructions being processed are irrelevant or at most secondary. They are merely pieces of binary data which need to be delivered to the rest of the CPU as requested. Much more important than the instructions themselves are the addresses by which they are retrieved and processed. The instruction addresses control which instructions are fetched, where they are stored in any resident caches, and whether duplications or conflicts exist between different execution threads or storage locations.
Unfortunately, the prior art lacks an effective method of precisely controlling the addresses to be handled by the IFU at any given point in the simulation. Randomly generated instruction address sequences do not allow for the creation of specific simulation scenarios which may be of interest to the designer. The '493 patent provides some improvement by collecting profile data such as addresses and program counter contents, but this approach still requires multiple passes of the simulation. It would, therefore, be desirable to devise an improved method for simulation of an instruction fetch unit which could allow dynamic control of the instruction addresses as the simulation progresses. It would be further advantageous if the method could force a specially selected instruction address to be fetched during the next IFU cycle.