1. Field of the Invention
The invention relates to a digital signal processor (DSP), and in particular, to a public register file in the DSP core circuits.
2. Description of the Related Art
FIG. 1 shows a conventional parallel architecture core DSP (PACDSP), comprising a plurality of clusters 200. Each cluster 200 is composed of different function units such as load/store unit 212 and arithmetic unit 222 for execution of various instruction types. A program controller 108 performs instruction fetch, dispatch and flow control functions. Instructions fetched from an instruction memory 106 are then dispatched to load/store unit 212 or arithmetic unit 222 in each cluster 200 according to their type, such that the load/store unit 212 and arithmetic unit 222 are triggered to function efficiently. In the cluster 200, every function unit has a dedicated register file. For example, the load/store unit 212 is associated with a address register file 214, and the arithmetic unit 222 a accumulation register file 224. If data exchange is required between the load/store unit 212 and arithmetic unit 222, a ping-pong register 210 is used as a bridge. The ping-pong register 210 comprises a plurality of register cells equally grouped into a ping register 202 and a pong register 204, accessed by the function units in a swapping fashion. Each register cell in the ping-pong register 210 can only be accessed by one function unit per cycle, thus, when the load/store unit 212 accesses the ping register 202, the arithmetic unit 222 is only accessible to the pong register 204, and vice versa.
The described architecture is referred to as a distributed register file architecture, mostly adopted in very long instruction word (VLIW) DSPs. The advantage is reduced power consumption and connection ports. When multiple function units require the same data, however, the architecture can be inefficient when performing data inter-exchange between the distributed function units. The load/store unit 212 and arithmetic unit 222 cannot use the same ping register 202 or pong register 204 at the same time, thus the ping register 202 and pong register 204 are accessed exclusively in turn. While duplicate data may be stored in the ping register 202 and pong register 204 to serve load/store unit 212 and arithmetic unit 222 synchronously, however, capacity occupation is also doubled. In addition to these inefficiencies, some consecutive data are rapidly and recursively updated for some applications, such as finite impulse response (FIR), infinite impulse response (IIR) and fast Fourier transform (FFT) algorithms. Identical instructions with different parameters are redundantly required to process the consecutive data, needlessly increasing the program code. It is therefore necessary to improve this architecture.