1. Field of the Invention
The present invention relates to digital signal processors. More specifically, the present invention relates to digital signal processing using highly parallel, highly pipelined, processing techniques.
2. Description of the Related Art
Digital Signal Processors (DSPs) are generally used for real time processing of digital signals. A digital signal is typically a series of numbers, or digital values, used to represent a corresponding analog signal. DSPs are used in a wide variety of applications including audio systems such as compact disk players, and wireless communication systems such as cellular telephones.
A DSP is often considered to be a specialized form of microprocessor. Like a microprocessor, a DSP is typically implemented on a silicon based semiconductor integrated circuit. Additionally, as with microprocessors, the computing power of DSPs is enhanced by using reduced instruction set (RISC) computing techniques. RISC computing techniques include using smaller numbers of like sized instructions to control the operation of the DSP, where each instruction is executed in the same amount of time. The use of RISC computing techniques increases the rate at which instruction are performed, or the clock rate, as well as the amount of instruction pipelining within the DSP. This increases the overall computing power of the DSP.
Configuring a DSP using RISC computing techniques also creates undesirable characteristics. In particular, RISC based DSPs execute a greater number of instructions to perform a given task. Executing additional instructions increases the power consumption of the DSP, even though the time to execute those instructions decreases due to the improved clocking speed of a RISC based DSP. Additionally, using a greater number of instructions increases the size of the on-chip instruction memory within the DSP. Memory structures require substantial (often more than 50% of the total) circuit area within a DSP, which increases the size and cost of the DSP. Thus, the use of RISC based DSPs is less than ideal for low cost, low power, applications such as digital cellular telephony or other types of battery operation wireless communication systems.
FIG. 1 is a highly simplified block diagram of a digital signal processor configured in accordance with the prior art. Arithmetic logic unit (ALU) 16 is coupled to ALU register bank 17 and multiply accumulate (MAC) circuit 26 is coupled to MAC register bank 27. Data bus 20 couples MAC register bank 27, ALU register 17 and (on chip) data memory 10. Instruction bus 22 couples MAC register bank 27, (on-chip) instruction memory 12, MAC register bank 27 and ALU register bank 17. Instruction decode 18 is coupled to MAC 26 and ALU 16, and in some prior art systems instruction decode 18 is coupled directly to instruction memory 12. Data memory 10 is also coupled to data interface 11 and instruction memory 12 is also coupled to instruction interface 13. Data interface 12 and instruction interface 12 exchange data and instructions with off-chip memory 6.
During operation, the instructions in instruction memory 12 are decoded by instruction decode 18. In response, instruction decode 18 generates internal control signals that are applied to ALU 16 and MAC 26. The control signals typically cause ALU 16 to have data exchanged between ALU register bank 17 and data memory 10 or instruction memory 12. Also, the control signals cause MAC 26 to have instruction data exchanged between MAC register bank 27 and instruction memory 12 or data memory 10. Additionally, the control signals cause ALU 16 and MAC 26 to perform various operations in response to, and on, the data stored in ALU register bank 17 and MAC register bank 27 respectively.
In an exemplary operation, instruction memory 12 may contain coefficient data for use by ALU 16 and MAC 26 and data memory 10 may contain data to be processed (signal data). The coefficient data may be for implementing a frequency filter using the DSP, which is a common practice. As the filtering is performed, both the signal data from data memory 10 and the coefficient data from instruction memory 12 are read into MAC register 27. Additional instruction data within instruction memory 12 is also applied to instruction decode 18, either through instruction data bus 22 or through a direct connection. The additional instruction data specifies the operation to be performed by MAC 26. The results generated by MAC 26 are typically read back into data memory 10.
Many processing inefficiencies result from this prior art processing. These processing inefficiencies include, e.g., bus, or access contention, to instruction memory 12, which must supply instruction data to both MAC register 26 and instruction decode 18, as well as bus, or access contention, to data memory 10, which must both read out signal data and write in the output data. Additionally, in many instances, additional processing on the output data must be performed by ALU 16. This further aggravates access to data memory 10, and therefore creates contention for data bus 20, because the output data must be written from MAC register bank 27 into data memory 10, and then read out to ALU register 17. These read and write operation are performed over bus 20 and therefore consume additional bus cycles. Such inefficiencies reduce the processing performance of the DSP.
The present invention seeks to improve the performance and usefulness of a DSP by addressing the problems and inefficiencies listed above, as well as by providing other features and improvements described throughout the application.