1. Field of the Invention
This invention relates to a digital signal processor (DSP) for carrying out signal processing under the control of a control processor and a digital signal processing system including the control processor and the DSP.
2. Prior Art
FIG. 1 shows the arrangement of a conventional DSP that carries out signal processing under the control of a control processor. In the figure, symbols C and D designate external registers which are storage means for storing input data to be processed by the DSP, and non-final and final results of the processing.
A multiplication/addition block 100 is connected to the external registers C and D via buses 13 and 14. The multiplication/addition block 100 is comprised of a multiplication/addition unit 10 for executing arithmetic (or arithmetic-logic) operations on input data supplied via the buses 13 and 14, and two accumulators ACC0 and ACC1 for storing results of the arithmetic operations by the multiplication/ addition unit 10.
The multiplication/addition unit 10 is comprised of two internal registers A and B, a multiplier 11, and an ALU (arithmetic-logic unit) 12. The internal registers A and B are for temporarily storing input data to be used in the arithmetic (or arithmetic-logic) operations by the ALU 12. Input data used for a multiplication operation is supplied to the multiplier 11 necessarily via the bus 13 or 14 through the internal register A or B, while input data for an arithmetic (or arithmetic-logic) operation other than a multiplication is supplied to the ALU 12 via the bus 13 or 14 through the internal register A or B.
The ALU 12 carries out arithmetic (or arithmetic-logic) operations on input data supplied from the internal registers A, B, the multiplier 11 and/or the accumulators ACC0, ACC1. The accumulators ACC0, ACC1 are for storing results of the arithmetic operations by the ALU 12. The data written into the accumulators ACC0, ACC1 are delivered to the bus 14 or again input to the ALU 12.
In FIG. 1, the buses and other signal lines are shown with numerals, such as 24 and 48, which indicate the bit widths of these signal lines. As shown in the figure, the buses 13, 14 and the output signal lines from the internal registers A, B each have a bit width of 24 bits, while a signal line from the multiplier 11, which outputs results of multiplication of 24-bit data by 24-bit data, has a bit width of 48. A signal line from the ALU 12, which occasionally accumulates output data from the multiplier 11, i.e. a result of a multiplication operation thereof, which has a bit width of 48 bits, has a bit width of 56 with an overhead of 8 bits added to the bit width of the output data from the multiplier 11.
The component elements and parts of the DSP described above are controlled by a program stored in advance in memory means, by so-called pipeline control. That is, assuming, for instance, that the DSP carries out convolution of time-series sample data with a predetermined sequence of filter coefficients, this convolution operation is carried out in the following manner:
First, at a certain clock timing, the multiplier 11 multiplies a first set of sample data and a coefficient stored in the internal registers A and B, respectively, and delivers a result of the multiplication (first multiplication) to the ALU 12. At the same time, a second set of sample data and a coefficient are written into the internal registers A and B, respectively.
Then, at the next clock timing, the result of the first multiplication is written from the ALU 12 e.g. into the accumulator ACC0, and at the same time a result of multiplication of the second set of the sample data and the coefficient (second multiplication) is supplied from the multipliers 11 to the ALU 12 and further a third set of sample data and a coefficient are written into the internal registers A and B.
Then, at the next clock timing, the result of the first multiplication delivered from the accumulator ACC0 and the result of the second multiplication delivered from the multiplier 11 are added together (i.e. accumulated) by the ALU 12, and a result of this addition is written into the accumulator ACC0. At the same time, a result of multiplication of the third set of the sample data and the coefficient (third multiplication) is delivered from the multiplier 11 to the ALU 12, and further a fourth set of sample data and a coefficient are written into the internal registers A and B.
Hereafter, multiplication of sample data and a coefficient, and accumulation of a result of the multiplication are repeatedly carried out in the same manner. Then, when multiplication operations of all sets of sample data and coefficients and accumulation of all results of the multiplication operations are completed, a result of this convolution operation, which is obtained at this time point as contents of the accumulator ACC0, is delivered to the bus 14, from which it is supplied to an external device.
Thus, arithmetic operations constituting a convolution operation, such as a multiplication operation and an addition operation, are carried out in parallel by respective devices, which enable the arithmetic operations to be executed efficiently.
Although the operation of the DSP is described above by referring to an example of the convolution operation, there is a case where a further multiplication operation is carried out on output data from the multiplication/addition unit 10, depending on kinds of arithmetic processing to be carried out. In this case, the output data from the multiplication/addition unit 10 is delivered via the accumulator ACC0 or ACC1 to the bus 14, from which it is written into the internal register A or B of the multiplication/addition unit 10.
In the conventional DSP, when the ALU 12 has carried out an arithmetic operation, the ALU 12 cannot start the next arithmetic operation before a result of the arithmetic operation is stored in the accumulator ACC0 or ACC1. Therefore, if the ALU 12 has completed an arithmetic operation before results of its preceding arithmetic operations written into the accumulators ACC0 and ACC1 are transferred to another device, the ALU 12 cannot write the result of the new arithmetic operation into any of the accumulators ACC0 and ACC1, so that the ALU 12 has to wait starting the next arithmetic operation until the accumulator ACC0 or ACC1 becomes available.
Further, some kinds of arithmetic processing require lots of arithmetic operations to be executed within a predetermined time period. When such a kind of arithmetic processing is executed by the DSP, if the start of the next arithmetic operation is delayed due to unavailability of the accumulators ACC0 and ACC1, there can be a case where all the required arithmetic operations cannot be completed within the predetermined time period. Conventionally, in such a case, data stored in one accumulator ACC0 or ACC1 is transferred to one of the external registers C and D, and a result of an arithmetic operation by the ALU 12 is stored in the one accumulator ACC0 or ACC1 which is made available by the transfer of the data therefrom, thereby enabling the ALU 12 to start the next arithmetic operation. In general, however, an accumulator of this kind has a bit accuracy corresponding to the bit width of 56 bits which is higher than a bit accuracy required of data processed by the DSP, which corresponds to the bit width of 24 bits, and hence if the contents of the accumulator are once written into an external register, this degrades the bit accuracy of the data, and in the worst case the data itself can be lost.
Further, there can be a case where the accumulator ACC0 or ACC1 suffers from an overflow during processing by the DSP. If the contents of this accumulator are delivered to the bus 14, the data can be lost by operation of an overflow-protect circuit if it is arranged in the path of delivery of the contents of the accumulator to the bus.
In the conventional DSP, it takes different time periods to execute arithmetic operations of identical contents, depending on whether the arithmetic operations are executed by using internal registers or by using external registers. FIG. 2 shows an example of two instructions for execution of arithmetic operations of identical contents by using internal registers and by using external registers, respectively.
First, an instruction shown in a left-hand column of FIG. 2, i.e. "reg_a=acc0+=reg_a*reg_b", which means "Multiply data stored in the internal register A by data stored in the internal register B, store the sum of the resulting product and data stored in the accumulator ACC0, in the accumulator ACC0, and store this data stored in the accumulator ACC0 in the internal register A", is carried out by sequentially executing the following three steps:
Step 1: The multiplier 11 multiplies the data stored in the internal register A by the data stored in the internal register B;
Step 2: The ALU 12 adds together the data stored in the accumulator ACC0 and the product from the multiplier 11, and the resulting sum is stored in the accumulator ACC0; and
Step 3: The data stored in the accumulator ACC0 is stored in the internal register A.
Each of the above steps is carried out over one clock, and therefore it takes a total of three clocks to carry out the above instruction.
In contrast, an instruction shown in a right hand column of FIG. 2, i.e. "reg_a=acc0+=reg_c*reg_d", which means "Multiply data stored in the external register C by data stored in the external register D, store the sum of the product and data stored in the accumulator ACC0, in the accumulator ACC0, and store this data stored in the accumulator ACC0 in the internal register A", is identical in contents of arithmetic operations to the above-mentioned instruction, but it takes a total of four clocks to carry out the instruction, since it is required to sequentially execute the following four steps:
Step 1: Data stored in the external register C is transferred to the internal register A, and data stored in the external register D to the internal register B;
Step 2: The multiplier 11 multiplies the data stored in the internal register A by the data stored the internal register B;
Step 3: The ALU 12 adds together the data stored in the accumulator ACC0 and the product from the multiplier 11, and the resulting sum is stored in the accumulator ACC0; and
Step 4: The data stored in the accumulator ACC0 is stored in the internal register A.
That is, the arithmetic operations carried out by using the external resisters require transfer of data from the external registers to the internal registers and hence it takes one clock longer to complete the arithmetic operations than when arithmetic operations of the identical contents are carried out by using the internal registers. Although in the above example, multiplications are carried out, even in a case where additions are carried out, similarly there arise a difference in processing time equivalent to one clock between when data in the external internal registers are added together and when data in the internal registers are added together.
As described above, according to the conventional DSP, even if arithmetic operations of identical contents are carried out, it takes different time periods to carry out the arithmetic operations, depending on whether internal registers or external internal registers are used. This makes it necessary to carry out timing adjustment, such as changing timing of start of execution of each arithmetic operation (changing a bus request line number) depending on whether the arithmetic operation uses internal registers or external internal registers, rendering timing designing of the program even more difficult to carry out. For example, if an instruction for arithmetic operations using the external registers C and D is first carried out, and then another instruction for arithmetic operations using the internal registers A and B is carried out one clock later, there arises an inconvenience that results of multiplications carried out according to these instructions are delivered to the bus 14 at the same timing. In addition to such an inconvenience related to the timing of outputting results of arithmetic operations carried out according to instructions, there arises, depending on how a program is written, an inconvenience that data which is prepared by executing a preceding instruction and stored in an accumulator is overwritten by data prepared by executing another instruction subsequent to the preceding instruction if this data is written into the accumulator before the data prepared by the preceding instruction is delivered from the accumulator to a proper destination. Therefore, the programmer has to be very careful that instructions of a program are each carried out at such a suitable timing as will not cause the above-mentioned inconveniences, by always confirming a time period required to execute each instruction. This requires a great deal of labor of the programmer.
FIG. 3 shows the arrangement of a digital signal processing system comprised of a DSP of a kind described above and a control processor. More specifically, the digital signal processing system is comprised of a DSP 1, a RISC (reduced instruction set computer)-CPU (central processing unit) 2, and a RAM (random access memory), all built in a single chip.
The RAM 3 is a dual port RAM which can be accessed both by the DSP 1 and the RISC-CPU 2. The DSP 1 is provided with an MMU (memory management unit) 15 for controlling writing of data into the RAM 3 and reading of data therefrom. Further, the RISC-CPU 2 is also provided with an MMU similar to the MMU 15. The RAM 3 is used not only as a work RAM used both by the DSP 1 and the RISC-CPU 2 but also as means for passing data between the DSP 1 and the RISC-CPU 2.
The DSP 1 is identical in construction with that described above with reference to FIG. 1 except that the external registers C and D are connected to the MMU 15, and therefore detailed description thereof is omitted.
When data is transferred between the DSP 1 and the RISC-CPU 2, the external registers C and D are used by the DSP 1 as means for storing data to be transferred therefrom to the RISC-CPU 2 and vice versa. More specifically, data to be transferred from the DSP 1 to the RISC-CPU 2 is stored in advance of the transfer in the register C or D, and then the MMU 15 stores the data stored in the register C or D at desired addresses within the RAM 3. The data stored in the RAM 3 is subsequently read by the MMU of the RISC-CPU 2. On the other hand, data to be transferred from the RISC-CPU 2 to the DSP 1 is first stored in the RAM 3, and the data stored in the RAM 3 is read by the MMU 15 of the DSP 1 and stored in the register C or D, for use in arithmetic operations or the like.
FIG. 4 shows another type of digital signal processing system, which is distinguished from the FIG. 3 digital signal processing system in that the external registers C and D of the DSP 1 are connected to the RISC-CPU 2 and a RAM 5 via a bus 4. According to this arrangement of the system, the RISC-CPU 2 directly writes data in the external registers C, D, whereby the data is transferred from the RISC-CPU 2 to the DSP 1, while on the other hand the RISC-CPU 2 reads data stored in the registers C and D, whereby the data is transferred from the DSP 1 to the RISC-CPU 2. The RAM 5 is a single port RAM which is connected to the bus 4 and used by the DSP 1 and the RISC-CPU 2 as a work RAM.
Now, to carry out a multiplication/addition operation by the conventional digital signal processing system shown in FIG. 3, two pieces of data are required for each multiplication/addition operation. However, the DSP 1 can only read out one piece of data from the RAM 3 per one reading operation, so that two reading operations are required to obtain data required by one multiplication/addition operation. This has been an obstacle to continuous high-speed execution of multiplication/addition operations.
One typical use of the digital signal processing system is a filtering operation in which externally-supplied time-series sample data are convolved with a predetermined sequence of coefficients. In most cases, one of two pieces of data for a multiplication/addition operation is data which incessantly varies, while the other is data of coefficients which are fixed in value. However, in the FIG. 3 system, the MMU 15 of the DSP 1 uniformly controls reading of two kinds of data which are thus different in characteristics, which imposes much burden on the MMU 15 and forms a bottle neck to efficient supply of data and coefficients to the arithmetic operations of multiplication/addition. In view of the efficiency of the processing, it is desirable that data which is incessantly updated and data of coefficients having fixed values should be efficiently read under different types of reading control suitable for their respective different characteristics. However, insofar as the common RAM is used for managing the storing of such data and coefficients, it is very difficult to carry out such different types of reading control.
On the other hand, in the FIG. 4 digital signal processing system, data required by the DSP for multiplication/addition operation are written into the external registers C and D by the RISC-CPU 2. However, these external registers C, D are connected to the RISC-CPU 2 via the common bus 4, so that the data has to be written into these registers through two reading operations separately carried out. Therefore, the DSP 1 has to wait for the RISC-CPU 2 to carry out writing of data two times before it starts the arithmetic operations of multiplication/addition.
Further, when the RISC-CPU 2 and the DSP 1 are interfaced by way of the registers C and D as in the FIG. 4 system, the RISC-CPU 2 and the DSP 1 are required to operate such that close relationship is maintained therebetween, which complicates control to be executed by each of these devices.