This invention relates to digital processing apparatus, and more particularly to a digital processing apparatus having a distributed architecture.
A classical Digital Signal Processor (DSP) has two major parts, namely a core architecture and the peripherals.
The major blocks of the core architecture are:
Program/Data Memory PA1 Arithmetic/Logic Unit (ALU) PA1 Multiplier/Accumulator (MAC) PA1 Barrel Shifter (BS) PA1 Data Address Generator (DAG) PA1 Program Address Generator (PAG) PA1 Registers (used to hold intermediary results, addresses, and speed up access to the previous five blocks) PA1 Buses PA1 Serial Port(s) PA1 Host Interface Port (parallel port) PA1 Timer(s) PA1 DMA controller PA1 Interrupt(s) controller PA1 newer technologies allowing faster clocking of old architectures and consequently higher processing power PA1 faster memories that allow improvements in the internal architecture of various blocks PA1 multiple internal buses PA1 new peripherals
Some of the peripheral blocks are:
Somewhere between these two blocks are:
Various DSPs may use distinct ALU, MAC and BS computational blocks or may blend them into multifunctional units.
The new generation of DSPs take advantage of:
One of the common problems associated with the traditional DSP architectures is the uneven loading of the processors in a multiprocessor design. To cope with this problem, more recently, new DSP architectures have been proposed and implemented that have parallel processing capabilities.
At the heart of their design is the concept of inter-processor communication via external interface ports, globally shared memory, and shared buses. The complexity of these designs, however, translates into extremely high cost IC implementations.
Parallel Computing (PC) increases processing power by permitting parallel processing at the routine (task) level. When a program has to execute two different routines that are independent at the data level (i.e. the data written by one routine is not read by the other routine), the two routines can be executed in parallel. This is referred to herein as macro parallelism.
Congestion can also occur at the instruction level. When a program has to execute a sequence of instructions that are independent, at data level, these instructions could be executed in parallel. Executing these instructions in parallel (herein referred to as micro parallelism) on the same processor, however, would require multiple buses and instruction words large enough to handle multiple operands.
An object of the invention is alleviate this problem.