The present invention relates to digital signal processors with instruction pipelines and more particularly to digital signal processors having an improved method of operating an instruction pipeline which issues store addresses separately from the associated store data and a memory interface which implements the method.
The Digital Signal Processor, DSP, is a specialized microprocessor which can process signal streams with complex mathematical formulas in real time. A DSP is typically ten to fifty times more powerful than typical microprocessor devices such as the microprocessor cores used in ASICs. To achieve high operating speed, DSPs normally use tightly coupled memory to which they can write data, and from which they can read data, very quickly. Applicant's ZSP DSP has an open architecture which allows it to be easily programmed and used for numerous applications.
One of the operating features which allows DSPs to operate at high speed is the use of an instruction pipeline. FIG. 1 is an illustration of an eight-stage instruction pipeline which has been used in the ZSP DSP. Only the functions relating to loading (reading) data from memory, and storing (writing) data to memory, are included in FIG. 1. In this illustration, addresses for both loading and storing data are generated in the fourth stage. A load operation is completed by issuing the load address in stage five and receiving the data from memory in stage six. While store addresses are also generated in stage four, the data to be stored is not available until the eighth stage of the pipeline. This is because the data to be stored is often generated in mathematical operations which are performed in stage seven. As a result, the store address cannot be issued until the eighth stage, at which time both the store address and data are issued. This delay tends to slow DSP operating speed.
Load and store instructions must be issued in the order in which they are generated in order to maintain coherency. When a load request is issued after a store request, the subsequent load request must wait until the store request is completed. In the FIG. 1 pipeline, this means that the load must be delayed, i.e. must wait in the fourth stage, until the store request has made it to the eighth stage and been issued. This can result in several clock cycles of delay for the load request and slows overall performance of the DSP.
The memory subsystem which controls read and write operations of a memory may have two common address ports. This speeds DSP operation by allowing it to issue two transaction requests simultaneously. The fact that the FIG. 1 pipeline must delay load requests while waiting for a store request to complete may create further problems with the dual address ports. While a load request is waiting in stage 4 for the store request to be completed, another load request may occur. When the store reaches stage eight, there will be two load requests which need to issue. With only two address ports, only one load request can be issued with the store request. The other must wait another clock cycle before it can be issued. This may also slow the performance of the DSP.