1. Field of the Invention
The present invention relates to a digital signal processor for implementing computation which is mainly intended for systems of signals.
2. Description of the Prior Art
FIG. 1 shows in block diagram a conventional digital signal processor intended mainly for audio signal processing, i.e., Digital Speech Signal Processor 1 (DSSP1) in this example described in the proceeding of symposium of the annual convention of The Institute of Electronics and Communication Engineers of Japan, Communication Section, held in 1986. In the figure, indicated by 1 is a program counter (PC) which incorporates a stack pointer for controlling the program address, 2 is a program ROM in the DSSP1, 3 is an instruction register 0 (IRO) for latching an instruction retrieved from the program ROM 2, 4 is an instruction register 1 (IR1) for latching the instruction released from the IRO 3 onto a program bus P-Bus 9, 5 is an instruction decoder for decoding the instruction held in the IR1 4, 6 is a bus interface register (BIR) for relaying immediate data placed on the P-Bus onto a data bus (D-Bus) 10, 7 is an address arithmetic unit (AAU) which incorporates three address registers for generating addresses, 8 is a page register (PR) which adds high-order 3 bits to a 9-bit address output from the AAU 7 to produce a 12-bit external RAM address, 9 is the program bus (P-Bus) for transferring instructions, 10 is the data bus (D-Bus) for transferring data separately from the P-Bus 9, 11, 12 are selectors, 13 is a data pipeline register 0 (DPR0) for receiving one input for an execution unit (will be described shortly), 14 is a data pipeline register 1 (DPR1) for receiving another input for the execution unit, 15 is an internal 2-port data memory (2P-RAM) having a 512-word capacity, 16 is a floating point multiplier (FMPY) for implementing the 12E6 floating point multiplication for the outputs of the DPR0 13 and DPR1 14, 17 is a pipeline register (P register) for latching the output of the FMPY 16, 18, 19 are selectors, 20 is a floating point arithmetic logic unit (FALU) for implementing the 12E6 floating point operation for two inputs to produce one output, 21 is a selector, 22 are accumulators 0-3 (ACCO-ACC3) each having a 4-word capacity, 23 is a loop counter (LC) for counting the number of loops in a program, 24 is a status register (SR) which indicates all statuses of the processor and controls the mode of operation, especially for setting the interrupt mode of a DMA controller 26. The connections of SR24 in FIG. 1 are omitted for simplicity of representation. 25 is a selector for selecting the outputs of the AAU 7 and DMA controller 26, 26 is the direct memory access (DMA) controller for implementing DMA transfer between serial I/O 33, 35, 37, 39 and an external RAM 32, 45 is a R/W controller for controlling data reading and writing for the external RAM, 27 and 28 are read RD and write WR signals provided by the R/W controller 45, 29 is an address register for latching the address 30 of the external RAM, 31 is a data register DR for latching data on the data bus 32 which is connected to the external RAM, 33 is SO0 for outputting serial output data 0 (34), 35 is SI0 for inputting serial input data 0 (36), 37 is SO1 for outputting serial output data 1 (38), 39 is SI1 for inputting serial input data 1 (40), 41 is an interrupt controller for analyzing the interrupt signal 42 to determine the interrupt, and 43 is a bus controller for controlling the data bus 32 connected to the external data RAM using the bus signal 44.
FIG. 2 shows an example of arrangement using two sets of DSSP1 shown in FIG. 1 for implementing a sample-by-sample process through the serial data ports. In the figure, indicated by 50 is a master processor which is the entire DSSP1 shown in FIG. 1, 52 is a request signal issued by the master processor 50 to a slave processor 51, and 53 is an acknowledge signal issued by the slave processor 51 to the master processor 50.
Next, the operation of the above conventional digital signal processor will be explained in brief. This system arrangement has the separate P-Bus 9 and D-Bus 10, allowing the instruction decoder 5 to perform instruction fetching and the FMPY 16 and FALU 20 to execute internal operations concurrently. In addition, address generation by the AAU 7 takes place in parallel to these operations, thereby preventing instruction overhead from the address generation. The system has a DMA transfer function by the DMA controller between the external data bus 32 and the two serial I/O systems 33, 35 and 37, 39. Since the DMA transfer uses the D-Bus 10, the internal operation halts for a 6 machine cycle time for each word transfer during the DMA transfer cycle. The parallel processing ability of this system is summarized as follows.
(1) Address generation (primary address) by AAU 7. PA0 (2) Floating point multiplication by FMPY 16. PA0 (3) Floating point arithmetic/logic operation by FALU 20. PA0 (4) Data transfer between 2P-RAM 15 and external RAM.
(5) Data transfer between two serial I/O systems 33, 35/37, 39 and external RAM (this data transfer causes an internal operation to halt for a 6 machine cycle time for each transferred word).
The system allots one system of serial I/O 33, 35, 37, 39 to the A/D converter and another system to the transmission path, making itself suitable for arranging on a single chip a full-duplex audio codec which implements DMA transfer in units of transmission frame and buffering using an external RAM, rather than for fast processing mass data using an external memory.
The arithmetic operator constituted by the FMPY 16 and FALU 20 is capable of executing a product-sum operation, which is performed often in FIR filters and FFT (fast Fourier transformation), in one machine cycle, as in the case of a multiplier-accumulator pair described in article "Packing a single processor onto a single digital board", by Louis Schirm, Electronics, Dec. 20, 1979. Namely, this system achieves the maximum throughput when the multiply-accumulation operation is carried out continuously, and it reaches 40 MIPS (mega instructions per second) when the machine cycle is 50 ns.
However, general signal processings include other types of operations such as absolute value accumulation and differential absolute value evaluation besides the multiply-accumulation operation, in which case the FMPY 16 cannot be used, resulting in a halved throughput of 20 MIPS achieved by the FALU 20 alone.
Next, the operation of the DSSP1 multiprocessor configuration will be described in connection with FIG. 2. Serial input data of system 0 (36) and system 1 (40) entered to the master processor 50 are rendered a certain computation by the processor 50, which yields an intermediate result. The master processor 50 issues a data transfer request 52 to the slave processor 51 and, after confirming a data transfer ready 53 from the slave processor 51, transfers the intermediate result held in the DR 31 to the slave processor 51 through the data bus 30. The DR 31 is timed for latching on the basis of the RD 27 and WR 28.
Subsequently, the intermediate result processed by the slave processor 51 is transferred back to the master processor 50 using a reverse procedure and, after being rendered a final computation by the processor 50, sent to the serial data output ports 34 and 38 of system 0 and system 1.
This system arrangement is based on the anticipation of the intricate signal processing algorism which is infeasible to carry out by a single DSSP1 and the requirement of throughput which is beyond the ability of a single DSSP 1. However, the system involves the following problems. First, in case of mass data transfer by way of the data bus 30, the throughput is not improved as expected due to the overhead of data transfer control. The master and slave processors 50 and 51 must be in complete synchronism in running their programs, and if the load is concentrated to one of processors the system throughout is determined by its maximum load. Finally, data transfer through the data bus 30 creates a bus contention with a process using the external RAM which is connected through the data bus 30, resulting in a possible significant fall in throughput.
Although these problems may easily be overcome by addition of an external control circuit, this will result in an increased hardware and lowered flexibility and the abandonment of the superiority of the signal processor in a processor-based system. Especially, image signal processing is mostly required to perform a very fast processing for mass data in a multiprocessor configuration using a video frame memory, and the above-mentioned scheme imposes a most significant problem in introducing processors in image signal processing.
As described above, the conventional digital signal processor is not always suitable for signal processings such as image signal processing, and an intentional application of this will incur an increased number of processors needed for compensating the lowered throughput and an increased hardware of external circuitry due to the increased load. As a result, it has been difficult to introduce a processor-based system to this technical field.