The present invention pertains to the field of integrated circuits. More specifically, the present invention pertains to a system and method for optimizing memory exchanges between a digital signal processor and an external memory.
Digital integrated circuits (e.g., processors, specifically digital signal processors) used in computer systems are increasingly powerful, and the rate at which they process data continues to get faster. To maximize the functionality and performance of the computer system, it is imperative that the supply of data to the processor keep up with, to the extent possible, the rate at which the data are required by the application being executed by the processor.
A digital signal processor (DSP) system of the prior art is the OAK(trademark) DSP core licensed from DSP Semi Conductor by VLSI Technology, Inc. In the OAK digital signal processor system, the DSP core includes a digital signal processor and internal memory (that is, memory that is on-core). The internal memory, by virtue of being located on the DSP core, is directly accessible by the DSP and thus able to transfer data very quickly to the DSP. Hence, data contained in the on-core memory are readily available to the DSP; therefore, by using the data from internal memory, the application can be optimally run at the speed of the processor. However, the internal memory is relatively small and limited in size by the on-core space that is available. In the OAK DSP core, for example, there is typically a total of 4K of on-core memory which is configured as two separate memories of 2K each. This amount of memory is not sufficient to hold the large quantities of data that are typically acquired and require processing.
In the prior art, the shortcoming with regard to on-core memory is addressed by supplementing the internal memory with external, or off-core, memory. The external memory is not limited by space considerations, and thus is capable of providing the space needed to store larger quantities of data. However, data stored in external memory need to be retrieved from there and delivered to the DSP core in order to be processed, and the processed data may need to be subsequently returned to external memory. Thus, the performance of the DSP system is limited by the speed at which data can be transferred over the data bus from the external memory to the DSP core, and likewise from the DSP core to external memory.
In the prior art, each transfer of data from external memory to internal memory, or from internal memory to external memory, takes at least two (2) clock cycles. Thus, in general, it takes 2N clock cycles to transfer N units (e.g., blocks or tables) of data. It is desirable to reduce the number of clock cycles required to transfer a given amount of data, so that data are transferred more quickly and overall system performance is improved.
In addition, the prior art is problematic because the size of the instruction sets (e.g.,.the code size) increases the size of the memory and thus also increases the overall size of the DSP system. Thus it is also desirable to reduce the size of the instruction set.
Accordingly, what is needed is a method and/or system that addresses the limitation placed on DSP system performance by the need to transfer data from off-core memory to on-core memory and by the rate at which those data are transferred over the data bus. What is further needed is a system and/or method that addresses the above need and utilizes an efficient instruction set. The present invention provides a novel solution to the above needs.
These and other objects and advantages of the present invention will become obvious to those of ordinary skill in the art after having read the following detailed description of the preferred embodiments which are illustrated in the various drawing figures.
The present invention provides a system and method that addresses the limitation on digital signal processor (DSP) system performance by reducing the number of clock cycles required to transfer data between internal and external memory. The present invention also reduces the size of the instruction set, thereby reducing the size of the memory and thus also reducing the overall size of the DSP system.
The present invention pertains to a system for transferring data in a single clock cycle between a digital signal processor (DSP) core and a memory unit, and method of same. The system includes the memory unit, a plurality of buses coupled to the memory unit, and the DSP core coupled to the plurality of buses. The system also includes a data transfer element coupled between the memory unit and the DSP core, where the data transfer element is adapted to transfer the data between the memory unit and the DSP core in a single clock cycle. The present invention functions by pipelining the data from the memory unit to the DSP core in a single clock cycle after the pipeline has been primed.
In one embodiment, the memory unit is external to the DSP core. In this embodiment, the data transfer element is a coprocessor including a plurality of latch devices coupled between the DSP core and the external memory unit via a plurality of data buses, respectively. The latch devices provide intermediate registers in the coprocessor for storing the data being transferred between the DSP core and the external memory unit. Data are transferred into the coprocessor during a first clock cycle and out of the coprocessor in a second clock cycle immediately following the first clock cycle.
In the present embodiment, a first set of data are transferred from one memory unit (e.g., from either the internal memory unit of the DSP core or from the external memory unit, depending on whether the transaction is a write transaction or a read transaction) into the coprocessor during the first clock cycle and out of the coprocessor to the other memory unit (e.g., to either the external memory unit or the internal memory unit of the DSP core, again depending on whether the transaction is a write transaction or a read transaction) in the second clock cycle occurring immediately after the first clock cycle. Data subsequent to the first set are likewise transferred from one memory unit to the coprocessor during each consecutive clock cycle occurring immediately after the first clock cycle, and from the coprocessor to the other memory unit during each consecutive clock cycle occurring immediately after the second clock cycle. Thus, data are pipelined out of one memory unit and into the other each clock cycle after the pipeline is primed.
In the present embodiment, an address bus is coupled between the DSP core and the external memory unit, and an address modification and decode mechanism is coupled to the address bus. In this embodiment, the address modification and decode mechanism is an offset register, wherein an offset value is specified and applied in order to map a first address in one memory unit to a second address in the other memory unit (e.g., an address in the internal memory of the DSP core is mapped to an address in the external memory, and vice versa).