The present invention relates generally to improvements in the simulation and emulation of multi-stage pipelined processors. In particular, the present invention describes advantageous methods and apparatus for eliminating a large quantity of redundant information during the simulation process. This reduction results in reduced numbers of saving and copying steps in the process of simulating or emulating the behavior of multi-parallel-stage VLIW array processors, such as the Manifold array (ManArray) processor.
In the development process for the design of a new processor, a simulator of the processor is typically created for test and verification purposes prior to implementing the design in an implementation-level hardware description language. Since the simulator is run on an existing processor to emulate the design of the new processor, it will of necessity have to emulate sequentially the large number of internal operations which will ultimately be done in parallel on the proposed processor. Such a simulator consequently runs considerably more slowly than the proposed processor it is intended to simulate, particularly if the architecture of the proposed processor is innately highly parallel such as is the case with processors using very long instruction word (VLIW) concepts and array processing mechanisms. At times, it is also desirable to emulate operation of one system with another for a variety of purposes.
In most modern computers, the execution of a single instruction is performed in a number of stages, such as the following, presented by way of example:
fetchxe2x80x94reads the next instruction from memory,
decodexe2x80x94interprets the instruction bit pattern to determine what operation is to be done,
executexe2x80x94does the operation, and
post/conditional returnxe2x80x94stores results for later use.
Generally speaking, in a pipelined computer, instructions pass through these stages in the order shown in such a way that all of the stages may be in use simultaneously, each performing tasks associated with different instructions. Implicit in this mechanism is the assumption that all of the stages can operate independently in a given cycle. For example, the process of fetching an instruction in a given cycle can have no effect during that cycle on the decoding of the instruction fetched in the previous cycle.
Also generally speaking, a simulator program typically emulates the hardware in a similar fashion, i.e., in the order shown. To do this, the simulation process must compute and temporarily store within each emulated pipeline stage the potentially large amount of information needed by the subsequent stage. For example, in a VLIW architecture, the decode stage must compute the next-cycle controls for a potentially large number of execution-stage units: arithmetic logic units (ALUs), multiply accumulate units (MAUs), and the like. Storing multiple copies of this information, current-cycle information and next-cycle information, uses a significant amount of memory, and copying the information from stage to stage takes significant time slowing the simulation. The present invention offers a way to significantly reduce both the memory and time requirements while achieving additional advantages as described in further detail below.
The present invention significantly reduces the amount of computer memory needed to simulate the behavior of a multi-parallel-stage pipelined processor, as well as significantly increasing the performance of the simulation process by eliminating or substantially reducing the storing and copying of redundant information. These results are achieved by reordering the chronological sequence of execution of software models of the various pipeline stages with respect to the actual instruction-flow sequence implemented by the processor hardware. The invention takes advantage of the independence of the stages and independence of the execution units within a cycle to make the results computed by a previous stage directly available to its subsequent stage without the use of transient data space or data copying. This technique can be used for the simulation and hardware emulation of existing sequential processors, new processor designs, or custom hardware to accurately and efficiently model the behavior of the processor/hardware, such as a multi-parallel-stage pipelined processor.
These and other features, aspects and advantages of the invention will be apparent to those skilled in the art from the following detailed description taken together with the accompanying drawings.