This invention relates generally to data processing and in particular to data processor testing.
The process of designing a data processor typically includes testing for design flaws at various stages of development. Such testing often involves running one or more test executables through a processor simulation system during a processor simulation stage of development, or through an actual processor in semiconductor form after a fabrication stage. In general, these test executables attempt to stress particular circuits and features of the processor.
A superscaler processor is a processor that is capable of executing multiple instructions simultaneously. Such processors typically include an execution stage having multiple execution units (execution circuits), each of which can execute an instruction independently of other execution units. Designers typically test superscaler processors using test executables created from source code having few or no instruction dependencies, or source code having weak instruction dependencies.
An instruction dependency (also referred to as a data hazard) exists when two instructions attempt to access the same register. The strongest type of instruction dependency is a read-after-write (RAW) dependency in which an initial instruction writes a result to a register and a subsequent instruction reads the result from that register. The subsequent instruction must wait until the initial instruction completes writing the result before it can read the result. The weakest type of instruction dependency is a read-after-read (RAR) dependency which involves two instructions attempting to read from the same register. Other types of instruction dependencies include write-after-read (WAR) and write-after-write (WAW) dependencies.
Instruction streams with weak instruction dependencies or no instruction dependencies stress the multiple execution capabilities of superscaler processors since there is little or no need to delay the instructions of such streams. Accordingly, instructions generally can execute as soon as an execution unit becomes available.
Stream #1, as shown below, includes no instruction dependencies, and stresses the multiple issue feature of superscaler processors.
Instruction 1 adds the contents of source register R01 to source register R02, and stores the result in destination register R03. Instruction 2 subtracts the contents of R04 from R05, and stores the result in R06. Instruction 3 adds the contents of R07 to register R08, and stores the result in R09. Instruction 4 subtracts the contents of R10 from R11, and stores the result in R12. Since none of the instructions access the same registers, there are no instruction dependencies. Accordingly, subsequent instructions do not need to be delayed while earlier instructions complete, and instructions may issue as soon as execution units become available to execute them. As a result, the execution units of the superscaler processor are consistently kept busy. For these reasons, designers of superscaler processors often create large executables similar to Stream #1, and use such executables to test the superscalar capabilities of their processor designs.
Another type of processor is called an out-of-order processor. An out-of-order processor is a processor that obtains instructions in a program order, and that is capable of executing instructions in an order that is different than the program order (i.e., capable of executing instructions out-of-order). Such processors typically include an issue queue that queues the instructions obtained in program order, and that is capable of issuing instructions out-of-order when instruction dependencies require that the processor delay issuance of instructions next in line. Designers typically test out-of order processors using a test executable created from source code having a large number of instructions with strong dependencies.
Stream #2 includes instructions with strong dependencies, and stresses the out-of-order issue feature of out-of-order processors.
Instruction 1 adds the contents of source register R01 to source register R02, and stores the result in destination register R03. Instruction 2 subtracts the contents of R03 from R04, and stores the result in R05. Instruction 3 adds the contents of R03 to R06, and stores the result in R07. Instruction 4 subtracts the contents of R08 from R09, and stores the result in R10. Since Instruction 1 stores its result in R03 and each of the Instructions 2 and 3 reads from R03, Instructions 2 and 3 having instruction dependencies with Instruction 1. Accordingly, Instructions 2 and 3 cannot issue until Instruction 1 stores its result. In contrast, Instruction 4 can issue at any time relative to Instructions 1, 2 or 3 since Instruction 4 does not access any registers that are accessed by the other instructions. Accordingly, an out-of-order processor may issue Instruction 1, and subsequently issue Instruction 4 prior to issuing Instructions 2 and 3. For these reasons, designers of out-of-order processors often create large executables from instruction streams similar to Stream #2 to cause instructions to issue out-of-order, and then use such executables to stress the out-of-order capabilities of their processor designs.
Some processors include both superscaler and out-of-order features. The superscaler feature of such a processor can be tested by running a test executable having instructions without dependencies similar to that of Stream #1 (shown above). Additionally, the out-of-order feature can be tested by running another test executable having instructions with dependencies similar to that of Stream #2 (shown above).
Stream #1, shown above, may stress a processor""s superscaler capabilities, but does not stress the processor""s out-of-order capabilities simultaneously. Similarly, Stream #2, shown above, may stress a processor""s out-of-order capabilities, but does not stress the processor""s superscaler capabilities simultaneously. Unfortunately, many design problems in complex processors will only be discovered when multiple processor features are stressed simultaneously.
A stream suitable for testing a processor""s superscaler capabilities with few or no dependencies (e.g., Stream #1 above) can be modified by introducing strong instruction dependencies, e.g., read-after-write (RAW) dependencies. However, increasing the number of RAW instruction dependencies reduces the number of independent instructions (instructions without dependencies) within the stream. That is, the resulting stream may improve the stream""s opportunity to cause an out-of-order execution, but such a stream may no longer be able to consistently stress the superscaler structures of the processor. Accordingly, some execution units may become idle and the throughput of the processor will decrease.
An embodiment of the invention is directed to a technique that can produce, in a computer, a test executable that can simultaneously test the superscaler and out-of-order capabilities of a processor. The technique involves forming multiple instruction streams, dividing the multiple instruction streams into portions, and generating a combined instruction stream having the portions interleaved. The technique further involves creating a test executable from the combined instruction stream.
Formation of multiple instruction streams preferably involves constructing the multiple instruction streams such that the multiple instruction streams access different groups of registers. Each instruction stream can provide instructions with strong dependencies for testing the out-of order capabilities of the processor. Additionally, the instructions within any particular stream are independent of the instructions of the other streams such that multiple execution units of the processor can be consistently kept busy.
Construction of the multiple instruction streams may involve operating a code generator such that the code generator provides each of the multiple instruction streams. Alternatively, such construction may involve operating a code generator such that the code generator provides a particular instruction stream, and forming other instruction streams according to the particular instruction stream.
To divide the streams into portions and generate a combined instruction stream having the stream portions, the technique may involve interleaving the portions within the combined instruction stream such that the portions alternate in a round-robin manner. Alternatively, the technique may involve interleaving the portions within the combined instruction stream such that the portions alternate in a pseudo random manner. Interleaving in a pseudo random manner may introduce nuances within the instruction stream that uncover design flaws that would otherwise be undetected.
Additional nuances within the instruction stream can be introduced in other ways, as well. In particular, the technique may further involve, prior to creating the test executable, including conflict instructions (e.g., instructions that cause conflicts) within the combined instruction stream. For example, LOAD instructions that cause cache misses may be included within the instruction stream to purposefully stall instructions with dependencies within the instruction stream. The LOAD instructions would more fully stress the processor""s out-of-order capabilities by adding delays to particular instructions depending on the LOAD instructions.
Furthermore, the formation of the multiple instruction streams may involve constructing the multiple instruction streams such that the multiple instruction streams communicate with each other. In particular, the multiple instruction streams can be formed such that they access common registers. Additionally, the multiple instruction streams can be formed such that they share common memory spaces. The sharing of common registers or memory spaces enhances the breadth of the processor test by also testing interstream communication aspects of the processor.
Another embodiment of the invention is directed to a simulation system for testing a simulated processor. The system includes an input that receives a test executable created from a combined instruction stream having interleaved portions of multiple instruction streams. The system further includes a processor simulator, coupled to the input, that runs the test executable to generate processor results. Additionally, the system includes a reference model, coupled to the input, that runs the test executable to generate reference results. Furthermore, the system includes a compare module, coupled to the processor simulator and the reference model, that compares the processor results and the reference results to determine whether the simulated processor operates correctly. The system simultaneously stresses the superscaler and out-of-order capabilities of the processor simulator such that design flaws can be detected and corrected prior to fabrication of the actual processor.