This invention is related in general to digital processing architectures and more specifically to the use of an adaptable data path using register files to efficiently implement digital signal processing operations.
Digital Signal Processing (DSP) calculations require many iterations of fast multiply-accumulate and other operations. Typically, the actual operations are accomplished by “functional units” such as multipliers, adders, accumulators, shifters, etc. The functional units obtain values, or operands, from a fast main memory such as Random Access Memory (RAM). The DSP system can be included within a chip that resides in a device such as a consumer electronic device, computer, etc.
The design of a DSP chip can be targeted for specific DSP applications. For example, in a cellular telephone, a DSP chip may be optimized for Time-Division Multiple Access (TDMA) processing. A Voice-Over-Internet Protocol (VOIP) application may require vocoding operations, and so on. It is desirable for a chip manufacturer to provide a single chip design that can be adapted to different DSP applications. Such a chip is often described as an adaptable, or configurable, design.
One aspect of an adaptable design for a DSP chip includes allowing flexible and configurable routing between the different functional units, memory and other components such as registers, input/output and other resources on the chip. A traditional approach to providing flexible routing uses a data bus. Such an approach is shown in FIG. 1.
In FIG. 1, memory bus 10 interfaces with a memory (not shown) to provide values from the memory to processing components such as functional unit blocks 30, 32 and 34. Values from memory bus 10 are selected and routed through memory bus interface 20 to data path bus 36. The functional unit blocks are able to obtain values from data path bus 36 by using traditional bus arbitration logic (e.g., address lines, bus busy, etc.). Within a block, such as functional unit block 30 of FIG. 1, there may be many different components, such as a bank of multipliers, to which the data from data path bus 36 can be transferred. In this manner, any arbitrary value from memory can be provided to any functional unit block, and to components within blocks of functional units.
Values can also be provided between functional unit blocks by using the data path bus. Another resource is register file 60 provided on data path bus 36 by register file interface 50. Register file 60 includes a bank of fast registers, or fast RAM. Register file interface 50 allows values from data path bus 36 to be exchanged with the register file. Typically, any register or memory location within register file 60 can be placed on data path bus 36 within the same amount of time (e.g., a single cycle). One way to do this is to provide an address to a location in the register file, either on the data path bus, itself or by using a separate set of address lines. This approach is very flexible in that any value in a component of a functional unit block can be transferred to any location within the register file and vice versa.
However, a drawback with the approach of FIG. 1, is that such a design is rather expensive to create, slow and does not scale well. A bus approach requires considerable overhead in control circuitry and arbitration logic. This takes up real estate on the silicon chip and increases power consumption. The use of a large, randomly addressable register file also is quite costly and requires inclusion of tens of thousands of additional transistors. The use of such complicated logic often requires bus cycle times to be slower to accommodate all of the switching activity. Finally, such an approach does not scale well since, e.g., adding more and more functional unit blocks will require additional addressing capability that may mean more lines and logic. Additional register file space may also be required. The data path bus would also need to be routed to connect to the added components. Each functional unit block also requires the bus control and arbitration circuitry.
Thus, it is desirable to provide an interconnection scheme for digital processor applications that improves over one or more of the above, or other, shortcomings in the prior art.