Digital signal processing is concerned with the representation, transformation, and manipulation of digital signals and the information they contain. Digital signal processors play a major role in such diverse fields as speech and data communication, biomedical engineering, acoustics, sonar, radar, seismology, oil exploration, instrumentation, robotics, consumer electronics, and many others. They can implement a wide range of digital signal processing algorithms including companding, filtering, Fast Fourier Transforms, and control algorithms.
Filters are a particularly important class of digital signal processors. A filter digital signal processor (DSP) can be defined as a system that implements a frequency-selective filtering algorithm to pass certain frequency components and reject all others. In a broader context, a filter DSP provides modification of certain frequency components relative to others.
Referring to FIG. 1, shown is an example of a conventional single chip digital signal processor that can implement various signal processing algorithms. The DSP chip, which in this example is a TMS320C2x, manufactured by Texas Instruments, Inc., uses a Harvard-type architecture that maximizes processing power by maintaining two separate memory bus structures, program and data, for full-speed execution. Instructions are included to provide data transfers between the program and data memory spaces. Externally, the program and data memory are multiplexed over the same bus so as to maximize the address range for both spaces while minimizing the pin count of the DSP chip.
The TMS320C2x chip comprises two large on-chip data RAM blocks 32 and 34 (a total of 544 16-bit words), one of which (block 32) is configurable either as a program or data memory. Programs of up to 4K words can be masked into an internal program ROM 36. A multiplier 38 performs a 16.times.16-bit 2s-complement multiplication with a 32-bit result in a single instruction cycle. Multiplier values come from the data memory or from the program memory, or immediately from an instruction word.
The TMS320C2x performs 2s-complement arithmetic using a 32-bit arithmetic logic unit (ALU) and accumulator (ACC) 40. The ALU is a general-purpose arithmetic unit that operates using 16-bit word taken from data RAM or derived from immediate instructions, or using the 32-bit result of the multiplier. The accumulator stores the output from the ALU and is the second input to the ALU. Instructions are provided for storing the accumulator words in the data memory.
A scaling shifter 42 has a 16-bit input connected to the data bus and a 32-bit output connected to the ALU. The scaling shifter produces a left-shift of 0 to 16 bits on the input data, as programmed in the instruction. Shift capabilities enable the processor to perform numerical scaling, bit extraction, extended arithmetic, and overflow prevention.
The DSP internal memory interface consists of a 16-bit parallel data bus 44, a 16-bit address bus 46 and pins for various control signals. All control operations are supported by a 16-bit timer 48. A serial interface 50 provides direct communications with serial devices such as codecs and serial A/D converters.
A multiprocessor interface 52 can be used between the TMS320C2x processors in multiprocessing applications. External user interrupts to control the DSP can be provided through an interrupt bus 54.
A DSP efficiently implements many application-oriented digital signal processing programs. Some of these programs, for example, a filter program, occupy a small amount of the program space but run repeatedly for long durations of time, and, therefore, require a substantial portion of the processor bandwidth. It would be desirable to utilize multiprocessor architecture to improve DSP performance.
However, adaptation of DSP to a multiprocessing environment has encountered particular problems, as will now be discussed.
Referring to FIG. 2, a conventional multiprocessor system comprises a plurality of processors 70 having their own instruction and data streams from corresponding memories 80. Each processor 70 can execute its own job instruction stream independently of the other processors when no interaction with another processor is required. However, in an implementation wherein one of the processors assigns some of its tasks to another processor, synchronization between the processors is required. Such syncronization is usually accomplished using memory-based locking techniques. The basic principle is that only one access to any memory location can occur in any memory cycle. As a result, substantial bottlenecks are created during communications between processors.
It would be desirable to provide a data and program memory arrangement in a multiprocessor system that reduces communication bottlenecks inherent in a multiprocessor architecture. It would also be desirable to adopt the resulting architecture to a DSP system.