1. Field of the Invention
The present invention relates to a digital signal processor capable of efficiently carrying out the arithmetic or interruption processing mainly of successive signals at a high speed through a small number of steps.
2. Description of the Prior Art
FIG. 1 is a block diagram showing the constitution of DSSP1 (Digital Speech Signal Processor 1), namely, an exemplary conventional digital signal processor, "A High-speed VLSI Signal Processor with Normalizing Floating-point Systems", the proceedings of the annual communication Symposium of the Institute of Electronics and Communication Engineers of Japan (IECEJ), 1986, Japan. This exemplary conventional digital signal processor corresponds to a first embodiment of the present invention.
Shown in FIG. 1 are program counter (PC) 1 internally provided with a stack for instruction address control, an instruction mask ROM 2 storing microinstructions, an instruction register (IR0) 3 for receiving one microinstruction provided by the instruction mask ROM 2 or one external microinstruction every machine cycle, an instruction register (IR1) 4 for receiving only the bit field requiring decoding included in a microinstruction given to the instruction register (IR0) 3, an instruction decoder 5 for decoding the microinstruction given to the instruction register (IR1) 4, a program bus (P-Bus) 6 for distributing microinstructions to the functional units, a register (BI) 7 which receives immediate data (18-bit width) included in a microinstruction provided on the program bus (P-Bus) 6 and applies the same to a data bus (D-Bus) 8 (18-bit width) for internally transferring data obtained by operation, a register (AM) 9 which receives an address mode instruction of data memory through the program bus (P-Bus) 6, a register (AD) 10 (4w.times.16-bit width) for holding address pointer information for generating an indirect address, a page register (PR) 11 (3-bit width) which specifies the page in an external data memory, an address computation unit (AAU) 12 (9-bit width) capable of simultaneously generating three addresses at the maximum, an address register (AR0) 13, an address register (AR1) 14, an address register (AR2) 15, an address selector (RAS) 16, a loop counter (LC) 17, a status register (SR) 18 for indicating the operating mode and status of the processor, a DMA control unit 19 for controlling the direct data transfer between serial I/O ports (SI0/1, SO0/1) 32 and an external data memory, an address register (AR) 20 for holding addresses of 12-bit width to be given to an external data memory, a dual port internal data memory (20-RAM) 21 of 512w.times.18 bits capacity capable of simultaneous read and write of two data, a register (DP0) 22 for holding input data of operand, a register (DP1) 23 for holding input data of operator, a multiplier (FMPL) 24 for multiplying the floating point of 12E6 bit format, a register (P) 25 for holding the results of operation of the multiplier (FMPL) 24, a selector 26, a selector 27, a floating-point arithmetic logical operation unit 28 mainly for carrying out the floating-point operation of 12E6 bit format, accumulators (ACC0 to ACC 3) 29 of 4w.times.18 bits for holding and accumulating the outputs of the floating-point arithmetic logical operation unit (FALU) 28, a data register (DR) 30 connected to the data bus (D-Bus) 8 to temporarily hold data to be read from and to be written in an external data memory, a read/write control circuit (R/W Cont) 31 for reading data from and writing data in an external data memory, serial I/O ports (SI0/1, SO0/1) 32 for full-duplex two-channel data transfer with external devices, an interrupt control circuit (Int Cont) 33, an external data memory bus control circuit (Bus Cont) 34, a clock control circuit (CLK Cont) 35 for controlling internal timing, and a selector 36.
FIG. 2 is a time chart of assistance in explaining the microinstruction execution sequence of the digital signal processor DSSP1 shown in FIG. 1. Shown in FIG. 2 are cycle timing 40 consisting of four phases of clocks, fetch stage timing 41 showing stages of the address output of the program counter (PC) 1 and the microinstruction input of the instruction register (IR0) 3, decode stage timing 42 of decoding the input microinstruction of the instruction register (IR1) 4 by the instruction decoder 5, timing 43 of updating the address computation unit 12 in the decode stage, timing 44 of operation of the floating-point multiplier (FMPL) 24, timing 45 of operation of the floating-point arithmetic logical operation unit (FALU) 28, timing 46 of transferring data through the data bus (D-Bus) 8 between the registers, and timing 47 of reading data from and writing data in the external data memory through the data register (DR) 30.
Referring to FIG. 3 showing the respective constructions of microinstructions of 32-bit width per word representing four groups of microinstructions of the digital signal processor DSSP1 of FIG. 1, indicated at 50 is a sequence instruction for controlling instruction processing steps, at 51 is a mode instruction for initializing and setting modes of the status register (SR) 18, the address computation unit (AAU) 12 and the DMA control unit 19, at 52 is an operation instruction mainly for controlling the operation of the floating-point arithmetic logical operation unit (FALU) 28 and parallel data transfer accompanying the operation of the floating-point arithmetic logical operation unit (FALU) 28, and at 53 is a load instruction for loading immediate data on an optional register or a data memory.
The operation of the digital data processor (DSSP1) will be described hereinafter, in which the components will be denoted by the abbreviated designation shown in the foregoing description.
First the general mode of operation will be described with reference to FIG. 1. In this digital signal processor DSSP1, the P-Bus 6 and the D-Bus 8 are provided individually. The application of the microinstruction to the IR0 3, the transfer of the microinstruction through the P-Bus 6, the decoding of the microinstruction by the instruction decoder 5, and execution of the instruction by the D-Bus 8, the FMPL 24 and the FALU 28 are carried out in parallel through a pipeline process. The D-Bus 8 and all the execution units including the 2P-RAM 21 are of register-base, namely, all the inputs and outputs are connected to the registers. In the timing of access to the registers, outputs are provided at the leading edge of the machine cycle and the outputs are set in the registers at the trailing edge of the machine cycle. The contents of data actually processed is not the contents of data set in the register by the same microinstruction, but the contents of data set in the register by the preceding microinstruction. Such a mode of operation is called as delayed operation. The interior of the arithmetic unit is partitioned into sections by the registers to enable the parallel operation of the sections. For example, the FMPL 24 continually executes the floating-point multiplication once every machine cycle. In applying data to the FMPL 20, the data is set in the DP0 22 and DP1 23 by the preceding microinstruction, and the contents of the P 25 is fetched by the succeeding or later microinstruction to obtain the results of multiplication. While the contents of the P 25 is being fetched, the data is held by the DP0 22, DP1 23 and the P 25. Accordingly, one multiplication operation formerly requiring three microinstructions for data input, multiplication and data output can be carried out by one microinstruction when the process is executed continuously.
In this DSSP1, the FMPL 24 and the FALU 28 are connected by the P 25. The FALU is able to accumulate the contents of the P25 in the ACC0 29 to the ACC3 29 to execute one term of product-sum operation which is often used in filtering and the barafly operation of fast Fourier transform (FFT) in one machine cycle similarly to one pair of a multiplier and accumulator shown in "Packing a signal processor onto a single digital board", Louis Schirm, Electronics, Dec. 20, 1979. For example, the product-sum is calculated by using an expression: ##EQU1## where N is an integer not less than 1 (one), and a.sub.i and b.sub.i are input data. In this processor, three microinstructions for applying data to the DP0 22 and DP1 23, for executing multiplication by the FMPL 24 and for accumulating the results of multiplication set in the P 25 by the FALU 28 in the ACC0 to ACC3 to obtain one term of the product-sum. Naturally, when the operation is carried out continuously, one term of the product-sum can be obtained for one microinstruction. Thus, to obtain one term of the product-sum for one microinstruction, two input data corresponding to the input data a.sub.i and b.sub.i must be given respectively to the DP0 22 and the DP1 23 every one microinstruction. Accordingly, the 2P-RAM 21 is enabled to supply the two input data, and a bus is provided to transfer the data read from the 2P-RAM 21 directly to the DP0 22 and DP1 23 without using the D-Bus 8 to avoid bus contention in the D-Bus 8. The AAU 12 has output means for selectively providing two address data among address data of 9-bit width provided through the AR0 13, AR1 14 and AR2 15 mainly to address the input data of the 2P-RAM 21. The AAU 12 is able to specify three addresses simultaneously at the maximum only in generating addresses for the two input data given thereto from the 2P-RAM 21, and an address for one output data given through the DR 30 and the AR 20 to the external data memory. Each addressing is a so-called indirect addressing system using an address pointer internally set in the AAU 12. The AR0 13 is susceptible to increment, modulo, bit reverse, repeat; increment base address and updating of increment, while the AR1 14 and the AR2 15 are susceptible only to simple increment. The AAU 12 is able to perform address operation only in a 9-bit natural binary system. In specifying a 12-bit address in the external data memory, three bits for specifying a memory page are added to the nine bits to specify twelve bits.
On the other hand, since the FMPL 24 and the FALU 28 execute operation in a normalized floating-point system of 12E6, all the data for the 2P-RAM 21, the DP0 22, the DP1 23, the ACC0 29 through ACC3 29, the DR 30, the D-Bus 8 and BI 7 are of 18-bit width, and hence the FALU 28 needs a special operation mode for calculating a special address initial value. Accordingly, the data representing the result of operation stored in the AR0 13, the AR1 14, the AR2 15, the AR 20 and the ACC0 29 through the ACC3 29 are not compatible with those data.
The DMA control unit 19 controls, independently of the microinstruction, full-duplex 2-channel data transfer between the serial I/O ports SI0/1 32 and SO0/1 32, and the external data memories through the D-Bus 8 and the AR 20 and DR 30. Therefore it is possible that the microinstruction operation controlled by the instruction decoder 5 and the internal resource contend with each other.
To avoid the contention, the instruction decoder 5 is held inoperative for six machine cycles for every word to interrupt operation according to the microinstruction in transferring data by the DMA control unit 19.
The DSSP1 is capable of performing the following operations in parallel within one microinstruction in executing microinstructions.
(1) The 9-bit address operation of three kinds at the maximum by the AAU 12.
(2) The floating-point multiplication of 12E6 by the FMPL24.
(3) The floating-point operation of 12E6 by the FALU 28.
(4) Data transfer through the D-Bus 8 and the DR 30 between the external memories.
(5) DMA data transfer through the full-duplex 2-channel serial I/O ports SI0/1 32 and SO0/1 32, D-Bus 8 and the DR 30 between the external memories.
The microinstruction execution timing of the DSSP1 will be described with reference to FIG. 2. The machine cycle 40 of the DSSP1 is divided into four phases of timing P0 through P3. The nominal machine cycle time is as high as 50 nsec. Accordingly, it is practically difficult to accomplish three operations, namely, reading a microinstruction from the instruction mask ROM 2, decoding the microinstruction by the instruction decoder 5 and execution of the instruction by the internal resources such as the FMPL 24 and the FALU 28, within one machine cycle. Accordingly, the three operations are divided into stages for each machine cycle to form a three-stage pipeline to enable high-speed operation. The following operations are performed in the stages of the three-stage pipeline.
(1) Fetch stage 41:
A microinstruction address is provided by the PC 1, a microinstruction is read from the instruction mask ROM 2 and the microinstruction is set in the IR0 3.
(2) Decode stage 42 and 43:
The microinstruction is transferred form the IR0 3 to the IR1 4, the microinstruction is decoded by the instruction decoder 5, the program control mode is set, the microinstruction is transferred from the IR0 3 to the P-Bus 6, and address operation of the AAU 12 through the AM 9 and the AD 10.
(3) Execution stage 44, 45, 46 and 47:
Operation of data by the FMPL 24 and FALU 28, data transfer through the D-Bus 8, and access to the external data memories through the AR 20 and the DR 30.
Thus, the DSSP1 needs three machine cycles to execute one microinstruction. This processor executes one microinstruction equivalently by the pipeline method. Accordingly, the actual execution of the microinstruction is delayed by two machine cycles from the read of the microinstruction from the instruction mask ROM 2. To avoid the timing contention between the internal resources, the internal buses are divided into the P-Bus 6 and the D-Bus 8, and the instruction mask ROM 2 and the 2P-RAM 21 are separated. However, since a branch instruction is executed actually in the decode stage the microinstruction being set in the IR0 3 is executed in the decode stage. That is, an instruction succeeding a branch instruction is executed unconditionally. To avoid such unconditional execution of instructions, the DSSP 1 changes the instruction succeeding the branch instruction automatically into a no-operation instruction (NOP) during the execution of the branch instruction. Such a function is aimed at simplifying microinstruction description; however, one machine cycle is wasted in the branching operation and two machine cycles are wasted in indirect branching operation using the D-Bus 8. Generally, no problem arises in about 80% of unconditional branching operations even if the succeeding instruction is executed when the sequence of instruction description is arranged properly and hence loss of machine cycles can be avoided. However, the DSSP1 is unable to avoid the loss of machine cycles.
The microinstruction set of the DSSP1 will be described hereinafter with reference to FIG. 3. A microinstruction set includes only four instructions, namely, a sequence instruction, a mode instruction, an operation instruction and a load instruction.
The sequence instruction controls the PC 1 for loop and subroutine call. The mode instruction initializes and sets the modes of the AAU 12, the selector 16, the LC 17, the SR 18 and the DMA control unit 19. The load instruction is used for loading immediate data of 18-bit width on the registers connected to the D-Bus 8 through the BI 7. The objective resources of the foregoing three microinstructions are fixed depending on instruction operation. On the other hand, as regards the operation instruction, all the internal resources capable of parallel operation must be specified directly. Accordingly, the bit length of the instructions is dependent on the bit length of the operation instruction. The DSSP1 uses horizontal microinstructions of 32-bit width. The FMPL 24 is free to run and no instruction is given thereto directly. The operation of the FALU 28 is specified directly by an instruction. For example, operations of the FALU 28 is controlled by the following instructions.
(1) Absolute value instruction: .vertline.X.vertline. PA0 (2) Signum function instruction: Sign (Y).multidot.X PA0 (3) Addition instruction: X+Y PA0 (4) Subtraction instruction: X-Y PA0 (5) Maximum value instruction: MAX (X, Y) PA0 (6) Minimum value instruction: MIN (X, Y) PA0 (7) Fixed-to-floating translation instruction: FLT (X) PA0 (8) Floating-to-fixed translation instruction: FIX (X) PA0 (9) Shift instruction: R1, L1 to L8 PA0 (10) Logic instruction: AND, OR, EOR, NOT PA0 (11) Mantissa addition instruction: X.sub.M +Y.sub.M PA0 (12) Characteristic substraction instruction: X.sub.E -Y.sub.E PA0 Input data conversion: N+2 steps PA0 Evaluation value calculation for one vector: 9N+2 steps PA0 Evaluation value rounding: About 3 steps PA0 Evaluation value comparison: 4 steps PA0 Calculation of reference vector address for the next node: About 9 steps PA0 Total: 18N+14 steps 1 stage +N+2 steps.
However, it is a problem in the DSSP1 that the operation of the DSSP1 is based on the floating-point operation, while the DSSP1 carries out logic and address operations on the basis of fixed-point operation. As mentioned above, the floating-point operation and the fixed-point operation are not compatible with each other. In addressing in the memory, for example, on the basis of the results of operation, the instruction (8) must be executed by the FALU 28. Furthermore, since floating-point data is not handled in general data input and output operation, the instruction (7) or the instruction (8) must be executed for every data input or output operation to translate the data.
Another problem in the DSSP1 in the DSSP1 is that bits are always truncated in normalizing floating-point data entailing errors in operation, because the accuracy of operation of the signal processor is limited. However, when floating-point data is normalized only by truncating bits, the absolute value of the results of operation is always smaller than the true value and hence the distribution of the errors is not random. The errors can be regarded as negligibly small by increasing the operation word length. However, since the ordinary signal processor is required to operate at a high speed, increase in the operation word length is limited.
Such a problem cannot be ignored particularly in image signal processing in which interframe processing is performed by an IIR digital filter (recursive digital filter), and the DSSP1 must round the results of operation by a logical operation instruction or the like. Furthermore, in a general signal processing algorithm, in most cases, the accuracy of operation is regulated specifically for every unit process, and hence the accuracy of operation does not necessarily coincide with the operation word length of the signal processor. In such a case, the format of the operation data is converted repeatedly by the FALU 28 for every unit process.
It is a further problem in the DSSP1 that operation capable of high-speed processing is limited only to product-sum operation. Such a limitation to the mode of operation is not a problem for FFT and FIR filters. However, the recent signal processing algorithm requires for operation to determine the degree of approximation of vectors A and B, i.e., distance calculation, such as expressed by the following expressions to be carried out at a high processing speed. ##EQU2## where a.sub.i and b.sub.i are the elements of vectors and N is the number of vectors.
The DSSP1 is unable to support such an operation and hence such an operation must be decomposed into individual four arithmetic operations for processing. Therefore, three separate operations must be executed to calculate a single term. When each one term is calculated by using the foregoing expressions, nine instructions (9=3.times.3) must be provided for one term due to delay, which deteriorates process multitude excessively. Naturally, the process multitude can be increased by sorting in differential+square accumulation by saving interim results by the 2P-RAM 21. However, it is difficult to use the limited space of the data memory effectively and hence it is impossible to process a large quantity of data.
Consider, for example, binary tree search as shown in FIG. 4. Suppose that an input vector A is set in the 2P-RAM 21, and a reference vector B of a tree construction is allocated to the nodes indicated by reference numerals in an external memory as shown in FIG. 5. An evaluation function expressing the degree of approximation between the input vector A and the reference vector B is absolute differential sum: ##EQU3##
A reference vector which provides the least absolute differential sum is selected at each node of the binary tree and finally a reference vector which provides the highest degree of approximation is obtained. In this binary tree search, when the number of the present node is n, the degree of approximation between two reference vectors B at a node 2n+1 and 2n+2 is determined, and then the node number of a reference vector to be compared at the next stage is calculated on the basis of the degree of approximation. To carry out the foregoing binary tree search by the DSSP1, the following instruction steps are necessary.
This total number of steps is approximately nine times the number of steps when the ideal number of steps necessary for evaluation value calculation is 2N and the conversion of address and input data is unnecessary. In such a process, since the same process is not performed successively, it is necessary to be always conscious of the context of instructions. Consequently, process efficiency is deteriorated significantly, a very complex program is necessary and, obviously, a problem arises in the quantity of work necessary for developing softwares.
The conventional digital signal processor thus constituted has the following problems.
It is necessary to be always conscious of the context of instructions in producing a program; and the same instructions must be executed successively otherwise the process efficiency cannot be improved.
Address and data format are not compatible with each other, and hence format needs to be converted for every data, for example, in table look-up.
Designed particularly for obtaining product-sum, the arithmetic unit is unable to operate at a high efficiency and a complex program is necessary in operation other than that for obtaining product-sum.
The control of the accuracy of data operation is difficult and automatic rounding is impossible.
Simultaneous Read/Write from and the data memory for 2-input 1-output operation is impossible, and the efficiency is deteriorated excessively, for example, in vector data processing.
Immediate specification of an indirect address mode in the instruction is impossible and hence the process needs to be interrupted for every address mode change.
A second exemplary conventional digital signal processor, which corresponds to a second embodiment of the present invention, will be described hereinafter.
FIG. 6 is a schematic block diagram showing the constitution of a digital signal processor (DSSP1) mainly for voice signal processing, published in the preprint No. S10-1 for the Denshi Tsushin Gakkai Tsushin Bumon Zenkoku Taikai Symposium, 1986.
Referring to FIG. 6, the DSSP1 comprises a program counter (PC) 61 for holding instruction execution addresses (hereinafter referred to as "instruction addresses"), an instruction memory 62 for storing instruction words, a decoder 63 for decoding instruction words, a program bus 64 for transferring decoded control data, a data memory 65 for storing data, a data bus 66 for transferring main data, a bus interface register (hereinafter abbreviated to "BIR") 67 for interconnecting the program bus 64 and the data bus 66, a processing circuit (hereinafter referred to as "EU") 68 which performs arithmetical operations, a register (flag register) 69 having a flip-flop for holding the status of the results of arithmetical operations, namely, a flag, an adder which adds 1 (one) to an input, a switching circuit 71, and a condition decision unit 72.
The operation of the processor will be described hereinafter with reference to FIG. 6. Generally, the signal processor has a pipeline construction to improve the processing speed. This exemplary digital signal processor has a three-stage pipeline construction.
An ordinary processor decodes and executes an instruction word, and then decodes and executes the next instruction word. A processor of a pipeline construction decodes the succeeding instruction word during the execution of the preceding instruction word. Accordingly, the processing speed of the processor of a pipeline system is higher than that of the ordinary processor. However, since advanced decoding is useless in executing instruction words including many jump instructions such as conditional branch instructions, the processing speed of the processor of a pipeline system is reduced in executing such instruction words.
A pipeline processing mode will be described hereinafter.
In the first stage of the pipeline, an instruction word 62A stored in the instruction memory 62 at an instruction address specified by an instruction address 70A provided by the PC 61 is read and is applied to the decoder 63.
In the second stage of the pipeline, a control signal produced by the decoder 63 by decoding the instruction word 62A is provided on the program bus 64 and a necessary control code is given to the BIR 67 via the program bus 64.
In the third stage of the pipeline, the control signal controls operations such as reading data 65A on the data bus 66 from the data memory 65, writing data provided on the data bus 66 in the data memory 65, and processing the data by the EU 68.
The EU 68 provides a flag 68B indicating the status of the result 68A of operation after processing the data. Generally, the flag 68B is a sign flag, a zero flag, an overflow flag or a carry flag.
The sign flag is a logical value "0" when the result 68A of operation is positive, and is a logical value "1" when the result 68A of operation is negative.
The zero flag is a logical value "0" when the result 68A of operation is zero, and is a logical value "1" when the result 68A of operation is not zero.
The overflow flag is a logical value "1" when overflow occurs in the result 68A of operation, and is a logical value "0" when overflow does not occur in the result 68A of operation.
The carry flag is a logical value "1" when carry or digit borrow occurs in the result 68A, and is a logical value "0" in cases other than carry and digit borrow.
The flag 68B is applied to and held by the flag register 69 until a new flag 68B is applied to the flag register 69 after the EU 68 has executed the next operation.
Ordinarily, an instruction word following an instruction word which has been executed is stored in an address greater by "1" than the instruction address 70A in which the executed instruction word was stored, when the executed instruction word does not specify a branch operation.
Accordingly, in the first stage of the pipeline, the instruction address 70A provided by the PC 61 is incremented by "1" by the adder 70 to make an address 71A greater than the instruction address 70A by "1". If the instruction decoded by the decoder 63 does not specify a branch operation, a control signal is applied to the switching circuit 71 to select the address 71A which is greater than the instruction address 70A by "1", and the logical value of a branch completion signal 72A becomes "0", and the address 71A obtained by adding "1" to the instruction address 70A is given to the PC 61.
The operation when the instruction word decoded in the second stage of the pipeline is a conditional branch instruction will be described hereinafter.
A conditional branch instruction is an instruction to specify executing an instruction word in a branched address specified by the instruction when the specified branch condition is met or to specify executing an instruction word in the next address.
When the conditional branch instruction is decoded, a flag 69A held by the flag register 69 is read and is applied to a condition decision unit 72. The condition decision unit 72 decides whether or not the branch condition 64A specified by the instruction is met. When the branch condition 64A is met, the logical value of a branch signal 72A becomes "1", the switching circuit 71 selects the branched address 64B specified by an instruction, and then the branched address 64B is given to the PC 61.
When the branch condition 64A is not met, the logical value of the branch signal 72A becomes "0", the switching circuit 71 selects the address 71A greater than the instruction address 70A by "1", and then the address 71A is given to the PC 61.
Operation of the processor in a case where only information indicating whether or not A=B (A and B are input data) is met is required will be described by way of example with reference to FIG. 7.
In the conventional processor, such information can be stored only in the data memory, and hence the following operation is performed.
First data A.sub.0 and B.sub.0 are compared. When the data A.sub.0 is equal to the data B.sub.0, the value of a predetermined address TS(0) in the data memory is made "1". When the data A.sub.0 is not equal to the data B.sub.0, the value of the address TS(0) is made "0".
Then, data A.sub.1 and B.sub.1 are compared and the result of the comparison is written at an address TS(1). The result of comparison of data A.sub.2 and B.sub.2 is written at an address TS(2).
FIG. 8 shows the sequence of operations of the PC 61, the decoder 63 and the EU 68 for processes shown in FIG. 7.
As shown in FIG. 8, in a machine cycle (hereinafter abbreviated to "M.C.") T, the PC 61 provided an instruction address N, and an instruction specifying the comparison of the data A.sub.0 and B.sub.0 is read from the address N in M.C. T+1.
Then, the EU 68 calculates the difference between the data A.sub.0 and B.sub.0. A zero flag 802 is provided in a M.C. T+2, and the zero flag 802 is set at the start of a M.C. T+3.
That is, when a conditional branch instruction is given, the condition decision unit 12 tests the zero flag and decides a branched address in the M.C. T+3.
Accordingly, to forbid the advanced decoding operation of the decoder 63, a NOP (no-operation) instruction is stored at an address N+1. When the logical value of the zero flag is "1", a conditional branch instruction is stored at an address N+2.
That is, when A.sub.0 =B.sub.0, a load instruction stored at an address N+3 (an instruction specifying storing the result of decision in the data memory 65) is executed in a M.C. T+5 as shown in (a) of FIG. 8, and then the address TS(0) is set for "1".
When A.sub.0 .noteq.B.sub.0, the load instruction at the address N+3 is replaced with a NOP instruction to branch the program to an address M as shown in (b) of FIG. 8, a load instruction at the address M is executed in a M.C. T+6 to made the value of the address TS(0) "0". After the completion of the operation following the decision A.sub.0 .noteq.B.sub.0, an unconditional branch instruction stored at an address M+1 is executed.
Accordingly, the PC 61 provides an instruction address N+4 in a M.C. T+7, which is delayed by three machine cycles from the machine cycle where the instruction address N+4 is provided when A.sub.0 =B.sub.0.
Thus, to set the result of comparison of the data A.sub.0 and B.sub.0 in the address TS(0) seven instruction steps, and four or seven machine cycles are necessary.
Twenty-one (21=7.times.3) instruction steps and twelve machine cycles at the minimum and twenty-one machine cycles at the maximum are necessary to complete the process shown in FIG. 7.
Since the conventional signal processor of the foregoing signal processing system operates in the foregoing mode and the information can be stored only in the data memory, a comparison instruction or the like must be executed and instruction words for two kinds of processes must be produced according to the status of a flag by using a conditional branch instruction in executing the process as shown in FIG. 7, namely, a conditional instruction process for a conditional test instruction to obtain information only as to whether or not the result of operation meets a predetermined condition, and hence the number of instruction steps is increased, execution time varies greatly depending on the result of decision and the processing efficiency is deteriorated.
Another mode of operation of the conventional processor shown in FIG. 6 will be described hereinafter. This mode of operation corresponds to that of a third embodiment of the present invention. Generally, a signal processor has a pipeline construction to improve the processing speed. This processor has a three-stage pipeline construction.
An ordinary processor decodes and executes an instruction word, and then decodes and executes the next instruction word, while the processor of a pipeline system decodes a succeeding instruction word during the execution of the preceding instruction word.
Accordingly, the processing speed of the processor of a pipeline system is higher than that of the ordinary processor. However, since advanced decoding is useless in executing instruction words including many jump instructions such as conditional branch instructions, the processing speed is reduced.
The pipeline processing mode will be described hereinafter. In the first stage of the pipeline, an instruction word 62A stored at an address specified by an instruction address 70A provided by the PC 61 is read from the instruction memory 62 and then the instruction word 62A is given to the decoder 63.
In the second stage of the pipeline, a control signal produced by the decoder 63 by decoding the instruction word 62A is provided on the program bus 64 and a necessary control code is given to the BIR 67 via the program bus 64.
In the third stage of the pipeline, the control signal controls operations such as reading data 65A on the data bus 66 from the data memory 65, writing data provided on the data bus 66 in the data memory 65, and processing the data by the EU 68.
The EU 68 provides a flag 68B indicating the status of the result 68A of operation after processing the data. Generally, the flag 68B is a sign flag, a zero flag, an overflow flag or a carry flag.
The sign flag is a logical value "0" when the result 68A of operation is positive, and is a logical value "1" when the result 68A of operation is negative.
The zero flag is a logical value "0" when the result 68A of operation is zero, and is a logical value "1" when the result 68A of operation is not zero.
The overflow flag is a logical value "1" when overflow occurs in the result 68A of operation, and is a logical value "0" when overflow does not occur in the result 68A of operation.
The carry flag is a logical value "1" when carry or digit borrow occurs in the result 68A, and is a logical value "0" in cases other than carry and digit borrow.
The flag 68B is applied to and held by the flag register 69 until a new flag 68B is applied to the flag register 69 after the EU 68 has executed the next operation.
Ordinarily, an instruction word following an instruction word which has been executed is stored in an address greater by "1" than the instruction address 70A in which the executed instruction word was stored, when the executed instruction word does not specify a branch operation.
Accordingly, in the first stage of the pipeline, the instruction address 70A provided by the PC 61 is incremented by "1" by the adder 70 to make an address 71A greater than the instruction address 70A by "1". If the instruction decoded by the decoder 63 does not specify a branch operation, a control signal is applied to the switching circuit 71 to select the address 71A which is greater than the instruction address 70A by "1", and the logical value of a branch completion signal 72A becomes "0" and the address 71A obtained by adding "1" to the instruction address 70A is given to the PC 61.
The operation when the instruction word decoded in the second stage of the pipeline is a conditional branch instruction will be described hereinafter. A conditional branch instruction is an instruction to branch the program to a branched address specified by an instruction when a branch condition specified by an instruction is met and not to branch the program when the branch condition is not met.
First a flag 69A held in the flag register 69 is read and is given to the condition decision unit 72 when a conditional branch instruction is decoded.
Then, the condition decision unit 72 decides whether or not a branch condition 64A specified by an instruction is met. When the branch condition 64A is met, the logical value of a branch signal 72A becomes "1", the switching circuit 71 selects a branched address 64B specified by an instruction, and then the branched address 64B is given to the PC 61. On the other hand, when the branch condition 64A is not met, the logical value of the branch signal 72A becomes "0". Then, the switching circuit 71 selects an address 71A greater than the instruction address 70A by "1", and then the address 71A is given to the PC 61.
Generally, when a processor having a pipeline construction executes a branch operation the operation is delayed by the pipeline. For example, suppose that a conditional branch instruction stored in the instruction memory 62 at an address N in a M.C. T. Then, the PC 61 provides an instruction address N+1 to read an instruction stored at an address N+1 while a decision is being made in a M.C. T+1.
When the branch condition 64A is met, an instruction word stored at the address N+1 in the decoder is invalidated and is replaced with a NOP instruction.
When the branch condition 64A is not met, the instruction word stored at the address N+1 is decoded and executed.
Suppose that the processor executes conditional branch operations, for example, conditional branch operations according to a program having branch conditions A and B of different priority as shown in FIG. 9, in which a process X is executed when the branch condition A is met, a process Y is executed when the branch condition A is not met and the branch condition B is met, and a process Z is executed when both the branch conditions A and B are not met. FIG. 10 illustrates the timing of the conditional branch instructions executed by the processor according to FIG. 9.
A conditional branch instruction A specifying branching the program to an address A for the process X when the branch condition A is met is stored in the address N, a conditional branch instruction B specifying branching the program to an address B for the process Y when the branch condition B is met is stored in the address N+1, and an unconditional branch instruction C specifying branching the program to an address C for the process Z is stored in the address N+2. When the address N is provided by the PC 61 in a M.C. T, the conditional branch instruction A is decoded and executed in a M.C. T+1. When the branch condition A is met, the PC 61 provides the branched address A in a M.C. T+2, and the instruction at the address N+1 in the decoder is replaced with a NOP instruction. When the branch consition A is not met, the PC 61 provides the address N+2 and the conditional branch instruction B is decoded and executed. When the branch condition B is met, the PC 61 provides the branched address B in a M.C. T+3, and the instruction at the address N+2 of the decoder is replaced with a NOP instruction. When the branch condition B is not met, the PC 61 provides the address N+3 and the unconditional branch instruction C is decoded and executed. Such a conditional branch instruction is able to specify merely binary decisions. Therefore, multipoint branch operation requires many machine cycles.
According to the signal processing system of the conventional processor, many conditional branch instructions need to be executed to accomplish a multipoint conditional branch process. Thus, the conventional signal procesor of a signal processing system has problems that the number of instruction steps is increased the execution time is extended uselessly and the processing speed is reduced.
Furthermore, since those problems prevent the effective use of the instruction memory, reduction in the processing efficiency is remarkable particularly in the image signal processing field in which a large quantity of data need to be operated at a high speed and multipoint conditional branch processes need to be executed on the basis of the operation.
FIG. 11 is a schematic block diagram of a fourth exemplary conventional signal processor employing the digital signal processor (DSSP1) mainly for voice signal processing published in the preprint No. S10-1 for the Denshi Tsushin Gakkai Tsushin Bumon Zenkoku Taikai Symposium, 1985. In this example, the digital signal processor is controlled by a host processor. The fourth exemplary conventional signal processor corresponds to a fourth embodiment of the present invention.
Referring to FIG. 11, there are shown a signal processor 82 mainly for signal processing, a host processor 81 for controlling the signal processor 82, an instruction memory selection signal 83, a reset signal 84 for initializing the signal processor 82, a program counter (PC) 85, an instruction address 86, an internal instruction memory 87, such as a ROM, storing instruction words, an external instruction memory 88 storing instruction words, a switching circuit 89 for selecting one of two instruction words according to the instruction memory selection signal 83, an instruction register (IR) 90 for holding an instruction word, a decoder 91 for decoding instruction words, an arithmetic unit 92 which carries out arithmetical operatins, a control signal 93, a data memory 94 storing data to be subjected to signal processing operation, and data 95.
FIG. 12 is a flow chart of assistance in explaining the operation of the signal processor.
The operation of this signal processor will be described with reference to FIGS. 11 and 12. Upon the connection of the signal processor to a power supply, first the host processor starts operation and gives a selection signal 83 to the signal processor 82 to specify the internal instruction memory 87 or the external instruction memory 88. The internal instruction memory 87 is selected when the logical value of the selection signal 83 is "0", while the external instruction memory 88 is selected when the logical value of the selection signal 83 is "1". Then, the host processor 81 gives the reset signal 84 to the signal processor 82. Upon the reception of the reset signal 84, the devices including the internal instruction register are initialized and the PC 85 is cleared. Then, the PC 85 gives an instruction address 86 specifying an address 0 to the internal instruction memory 87 of the signal processor 82 and to the external instruction memory 88 to read instruction words stored at the specified address 0 and the instruction words are given to the switching circuit 89. The switching circuit 89 selects either the instruction word read from the internal instruction memory 87 or the instruction word read from the external instruction memory 88 according to the selection signal 83 given thereto from the host processor 81 and gives the selected instruction word to the IR 90. The instruction word held by the IR 90 is decoded by the decoder 91 to provide control signals to the devices. The internal arithmetic unit 92 of the signal processor 82 is controlled by the control signal 93 provided by the decoder 91 to process the data 95 stored in the data memory 94 through arithmetical operations.
The conventional signal processor needs a comparatively large-scale control program for complex signal processing and the capacity of the internal instruction memory 87 of the signal processor 82 is insufficient to store such a large-scale control program, and hence the external instruction memory 88 is necessary. When the external instruction memory 88 is employed, an input/output (I/O) device is necessary for transferring the internal signals of the signal processor 82 and the external signals through external terminals to provide the instruction address 86 and to receive instruction words. In such a case, a long time is necessary for reading instruction words from the internal instruction memory 87 because the instruction words are read through an additional device. Accordingly, a clock signal having a long period must be applied to the signal processor 82 when the external instruction memory 88 is employed, and the signal processor 82 must be initialized by the reset signal 84 after switching the instruction memories or the signal processor will malfunction.
The internal instruction memory 87 of the signal processor 82 is a so-called mask ROM in which a program is written beforehand in fabricating the signal processor 82, and a special program for special processes is stored in the internal instruction memory 87.
The conventional signal processor thus constituted needs to read instruction words from the external instruction memory in executing complex processes requiring additional time for reading instruction words from the external instruction memory to reduce the processing efficiency. Furthermore, since the internal instruction memory is a read-only memory, the change of the program after the completion of the signal processor is impossible and, when errors are found in the program or when the program needs correction, the signal processor needs to be changed for a new one, which deteriorates the efficiency of development and is uneconomical.
FIG. 13 is a schematic block diagram of assistance in explaining a typical interruption process, showing a fifth exemplary conventional signal processor published in "Television Gakkai-shi, DSP, Minor Special Issue" pp. 219-233, 1987/3. The constitution of the signal processor per se is not related directly with the interrupt process. This fifth exemplary conventional signal processor corresponds to a fifth embodiment of the present invention.
Referring to FIG. 13, there are shown an external interruption request signal (hereinafter abbreviated to "INTR") 101, an interrupt control circuit 102 which starts interruption process upon the reception of INTR 101, an interruption response signal (hereinafter abbreviated to "INTA") 103 given through the interrupt control circuit 102 to an external device, an interrupt mask register 104 which holds the status of interrupt able or interrupt disable, an interruption process start signal 105, an interrupt address register 106 which holds an interruption process start address, an interruption process start address 107, a multiplexer 108, a program counter (hereinafter abbreviated to "PC") 102 which holds instruction execution addresses, a stacker (hereinafter abbreviated to "STK") 110 of a last-in first-out system (LIFO system) which keeps the instruction address immediately before interruption process on standby, an instruction address 111 provided by the PC 109, an address register (AR) 126, a data address 112 provided by the AR 126, an instruction memory 114 storing execution control instructions, a main bus 115 for transferring main data, a data memory 119 storing data, data 116 written in or read from the data memory 119, an instruction 117 read from the instruction memory 114, an instruction register (IR) 120 for decoding the instruction 117, a sequence control circuit 121 for distributing predetermined control signals to the component devices according to instructions, a temporary register 122 (TR) which receives data through the main bus 115, an arithmetic circuit (EU) 123 for arithmetical operations, a pipeline register (PR) 124 for temporarily storing the output signals of the EU 123, a working register (WR) 125 for storing the results of operation of the EU 123, an address register (AR) 126 which provides a data address 112, and an address generating circuit (AGU) 127 which calculates the data address 112.
FIG. 14 is a flow chart showing the steps of an interruption process to be executed by the signal processor of FIG. 13.
The operation of this system will be described hereinafter with reference to FIG. 13. When there is not any interruption request, the PC 109 gives the instruction address 111 to the instruction memory 114, and the instruction 117 is given to the IR 120. Then, the sequence control circuit 121 distributes control signals according to the instruction given to the IR 120 to control the devices for executing predetermined processes. The AGU 127 and the arithmetic circuit 123 processes the data address 112 and the data 116 according to the control signals. The TR 122, the PR 124, the WR 125 and the AR 126 temporarily hold data necessary for the processes to carry out the processes efficiently.
When the interruption request signal INTR 101 is given to the interrupt control circuit 102, the PC 109 interrupts the operation temporarily, keeps the instruction address 111 presently being executed on standby in the stack 110, changes the instruction address 111 for the interruption address 106 to start the execution of the interruption process. Since the contents of the registers which are being used at the moment of interruption of the operation of the PC 109 among the registers to be used for interruption process, i.e., the TR 122, the PR 124, the WR 125 and the AR 126, need to be restored at the end of the interruption process, an instruction is provided to keep the contents of those registers on standby in the memory before starting the interruption process. An instruction is provided to return the contents kept on standby from the memory to the corresponding registers immediately before the end of the interruption process. Then, the instruction address kept on standby in the stack 110 at the start of the interruption process is fetched and is stored in the PC 109 to restart the process. The sequence of the interruption process is shown in FIG. 14. Operations to keep the contents of the registers on standby and to restore the standby contents to the corresponding registers for interruption process are carried out in response to instructions. Accordingly, when the arithmetic circuit 123 is, for example, of a pipeline construction, the pipeline register 124 and the like which cannot be kept on standby and restored by instructions cannot be kept on standby.
In such a case, since return from the interruption process is impossible when an instruction which uses the arithmetic circuit 123 is being executed in an ordinary process, mask data is written in the interrupt mask register 104 to forbid an interrupt input. While interruption is forbidden, the interrupt control circuit 102 does not provide the INTA 103 in response to the INTR 101; consequently, the external device which generated the interrupt request signal is kept on standby until the INTA 103 is provided.
Since the conventional interruption processing system carries out the interruption process in the foregoing manner, response to an interrupt request is delayed. Particularly, when the conventional interruption processing system is applied to a multiprocessor or a real-time signal processing system, the general processing efficiency of the system is reduced. Since the data is kept on standby by instructions, the interruption process requires an increased time and, when interruption requests are given frequently to the system, in particular, most steps of the process are used for keeping the data on standby and restoring the data, significantly reducing the efficiency of the interruption process.
FIG. 15 shows the constitution of an address generator of a conventional address control system as a sixth example of the prior art published in "TMS32020 User's Manual", issued by Texas Instruments. The sixth example corresponds to a sixth embodiment of the present invention.
Shown in FIG. 15 are a data bus 131 for data transfer, an auxiliary register pointer standby register (ARB) 133, a data path 132 connecting the ARB 133 to the data bus 131, an auxiliary register pointer (ARP) 136, a data standby path 134 extending from the ARP 136, a data path 135 connecting the ARP 136 to the data bus 131, auxiliary registers (AR0, AR1, AR2, AR3, AR4) 138 having five words, a selection signal 137 provided by the ARP 136 to select one of the ARs 138, an indirect address data 139 provided by the AR 138, an address data 140 provided by the AR 138, an arithmetic unit (ARAU) 142 for the auxiliary registers ARs 138, an updated address data 141 produced by updating the address data 140 by the ARAU 142, a data memory page pointer (DP) 143 for the direct address, a data memory page data 144 provided by the DP 143, a multiplexer 145 which multiplexes direct address data (dma) 146 indicated by an immediate value and the data memory page data 144 to generate a direct address, a direct address 147, a selector 148 which selects either the direct address 147 or the indirect address 139, an address output 149, an address control code 150, an auxiliary register pointer control signal 152, a decoder 153 for decoding an address control code and providing a control signal 151 to the ARB 133, a control signal 154 for controlling the ARAU 142, and a control signal 155 for selecting either the direct address 147 or the indirect address 139. FIG. 16 shows the contents of the address control code 150 applied to the address generator of FIG. 15. In FIG. 16, indicated at 157 is an indirect address specifying code and at 158 is a direct address specifying code.
FIG. 17 is a table showing address control codes 150 and the corresponding operations.
The operation of the address generator will be described with reference to FIG. 15, in which the devices will be denoted by abbreviations for simplicity. When an address control code 150 specifying a direct address is applied to the address generator, the decoder 153 provides direct address data 146 indicated by an immediate value of seven bits in the address control code 150. The multiplexer 145 multiplexes the direct address data 146 and data memory page data 144 of nine bits held in the DP 143 to generate a direct address 147. Finally, the selector 148 selects the direct address 147 according to a selection control signal 155 to provide an address 149.
When the address control code 150 applied to the address generator specifies an indirect address, the decoder 153 provides an ARP control signal 152 indicating one of the AR0 to AR4 as an indirect address 139. The selector 148 selects the indirect address 139 according to a selection control signal 155 to provide an address 149. Then, the ARAU 142 executes a predetermined updating process to update the selected AR 138 among the AR0 to AR4 in order to calculate an indirect address data to be used by the next instruction. There are five indirect modes as follows.
1. The AR 138 indicated by the ARP 136 is used as a data memory address.
2. The data memory is accessed for the contents of the AR 138 indicated by the ARP 136, and then the contents are decremented by one.
3. The data memory is accessed for the contents of the AR 138 indicated by the ARP 136, and then the contents are incremented by one.
4. The data memory is accessed for the contents of the AR 138 indicated by the ARP 136, and then the contents of the AR0 138 are subtracted from the contents of the AR 138 indicated by the ARP 136.
5. The data memory is accessed for the contents of the AR 138 indicated by the ARP 136, and then the contents of the AR0 138 are added to the contents of the AR 138 indicated by the ARP 136.
That is, in this example, the indirect addressing modes using the AR0 to AR4 are classified roughly into two operating modes in which the ARAU 142 operates. 1. Ordinary addressing by incrementing or decrementing the AR0 to AR4 by one.
2. Indirect addressing with index modification on the basis of the contents of the AR0.
These addressing modes are considered to be suitable mainly for simplified one-dimensional data addresses for aural signal processing. However, these addressing modes are unable to deal with complex one-dimensional data addresses such as a bit reverse address used for fast Fourier transformation (FFT) stated in C. S. Burrus, T. W. Parks, "DFT/FFT and Convolution Algorithms--Theory and Implementation", John Wily and Sons, 1985. To deal with such an addressing mode, the address must be converted into an address capable of dealing with such an addressing mode by using the ARAU 142 by calculating a desired address by a data operating unit. However, this procedure requires additional data processing time.
Similarly, these addressing modes are unable to deal with two-dimensional address for addressing data of a matrix of n-rows.times.m-columns. In this case, the data processing time is increased remarkably in many cases because the data operating unit is used for calculating every address. Furthermore, the control code description is complex and the production of a program employing complex control codes is difficult. These problems are disadvantages in application to the image signal processing field in which a large quantity of data needs to be processed at a high data processing speed.
The conventional address control system thus constituted needs to perform the data operating unit for address calculation in many cases when applied to a two-dimensional process such as image signal process and requires complex program softwares for a signal processing algorithm.
FIG. 18 is a block diagram showing the constitution of a multiport memory circuit for a digital signal processor, employing ordinary single-port memories, as a seventh example of the prior art. The seventh example corresponds to a seventh embodiment of the present invention.
Referring to FIG. 18, there are shown a random access memory (RAM) 171, a RAM access unit 190 including an address selector 177, a read/write (R/W) timing control circuit 178 and a bilateral data selector 179, an address signal (AD signal) 172 given from the address selector 177 to the RAM 171, a timing signal 173 for controlling the address selector 177, a timing signal 174 for controlling the data selector 179, a R/W control signal 175 applied to the RAM 171, a data signal (D signal) 176 provided by the RAM 171, an input/output (I/O) unit 189 including address registers (AR1 to AR3) 180 and data registers (DR1 to DR3) 181 respectively connected to ports, a data signal (DP1) 182 at the access port 1, a data signal (DP2) at the access port 2, a data signal (DP3) 184 at the access port 3, a clock signal (CLK) 185, an address signal (AP1) 186 at the access port 1, an address signal (AP2) 187 at the access port 2, and an address signal (AP3) 188 at the access port 3.
FIG. 19 is a time chart showing the timing of operation of the multiport memory circuit of FIG. 18.
The operation of the multiport memory circuit of FIG. 18 will be described hereinafter. This multiport memory circuit has three ports. The address signals AP1, AP2 and AP3 at the ports are applied respectively to the corresponding address registers AR1, AR2 and AR3 at a period one-third the period of the CLK 185. The address selector 177 selects the address registers AR1, AR2 and AR3 in a predetermined sequence in a time sharing mode according to the timing signal 173 from the R/W timing control circuit 178 to provide address signals ADs. The R/W control signal is provided according to a R/W identification signal superposed on the address signals AP1, AP2 and AP3 at the corresponding ports to control the RAM 171 for R/W operation.
Likewise, the data signals DP1, DP2 and DP3 at the corresponding ports are applied respectively to the corresponding data registers DR1, DR2 and DR3 at a period one-third the period of the CLK 185 in synchronism with the address signals AP1, AP2 and AP3. Then, the data selector 179 selects the data registers DR1, DR2 and DR3 in a predetermined sequence in a time sharing mode according to the timing signal 174 provided by the R/W timing control circuit 178 to provide a data signal D. In read operation, the direction of output of data is reversed while the rest of the operations are the same as those for write operation.
The RAM 171 reads the data signal D or writes the data signal D in an address specified by the address signal AD according to the R/W control signal R/W.
Thus, the conventional multiport memory circuit comprises the RAM 171, the RAM access unit 190 and the I/O unit 189 so that the respective ports of the ordinary single-port memories can be accessed in a time sharing mode. The cycle time t.sub.p of each port is given by EQU T.sub.p =n.times.t.sub.cy (sec)
where t.sub.cy is the cycle time of the RAM 171, and n (an integer not less than one) is the number of access ports. That is, in view of the cycle time t.sub.p of each port, the RAM 171 functions as a multiport memory circuit capable of pseudosimultaneous R/W operation. When the same address is specified by the two different ports for read operation, no problem arises. However, when either port or both the ports are for write operation, in many cases, a known control method sets each port for priority and applies a BUSY signal to the port of lower priority to temporarily forbid access to the same port. FIG. 19 is a time chart of assistance in explaining such a cycle timing operation.
Since the conventional multiport memory circuit is constituted as stated above, the cycle time of each port increases in proportion to the number of access ports and thereby the operating speed of the multiport memory circuit is reduced. Furthermore, the circuit configuration of the conventional multiport memory circuit is complex and the scale of the circuit increases progressively with the increase in the number of access ports.
FIG. 20 shows an eighth prior art and is a block diagram indicating the structure of an address generator based on the address control system of the prior art disclosed, for example, in the "USER's MANUAL TMS32020" issued by Texas Instruments Inc. This exemplary conventional digital signal processor corresponds to an eighth embodiment of this invention.
In this figure, 701 is data bus for data transfer, 702, is data bus to an auxiliary register pointer save register (ARB) 191 from the data bus 701, 191 is auxiliary register pointer save register (ARB), 704 is data save bus extending from auxiliary register pointer (ARP) 192, 705 is data bus between auxiliary register pointer (ARP) 192 and data bus 701, 707 is selection signal to select auxiliary register (AR) 193 from the auxiliary register pointer (ARP) 192, 193 is auxiliary register (AR=AR0-AR4) providing 5 words, 709 is relative (indirect) address output from auxiliary register (AR) 709, 710 is address data sent from the auxiliary register (AR) 193, 711 is new address data obtained by updating address data 710 in the operation unit only for auxiliary register (ARAU) 194, 195 is data memory page pointer for direct address (DP), 714 is data memory page data output from data memory pointer for direct address (DP) 195, 196 is multiplexer (MUX) which generated direct address 717 by multiplexing the direct address data indicated by immediate value and data memory page data 714, 717 is direct address, 197 is selector which selects direct address 717 and relative address 709, 719 is address output, 720 is address control code, 721 is auxiliary register pointer save register control signal, 722 is auxiliary register pointer control signal, 198 is decoder for decoding address control code 720, 724 is control signal which controls operation unit only for auxiliary register (ARAU) 194, 725 is selection control signal which selects direct address 717 and relative address 709, 726 is data input/output bus between auxiliary register (AR) 193 and data bus 701, 727 is data input/output bus between data memory page pointer for direct address (DP) 195 and data bus 701.
FIG. 21 is a table for explaining operation of address generator of FIG. 20 by the address control code.
Next, operations of address generator of the prior art are explained. In case, the input address control code 720 is designated by direct address, the decoder 198 outputs the direct address data 716 indicated by the immediate value of 7 bits in the instruction code. This direct address 716 and data memory page data 714 of 8 bits held by the data memory page pointer for direct address (DP) 195 are multiplexed by the multiplexer (MUX) 196 to generate the direct address 717. Finally, the selector 197 selects the direct address 717 from the selector 197 with the selection control signal 725 and outputs the address output 719.
Next, in case the input address control code 720 is designated by relative address, one of the auxiliary registers (AR) 193 indicated by the auxiliary register pointer control signal 722 output from the decoder 198 is output as the relative address 709. This is selected by the selector 197 with the selection control signal 725 to output the address output 719. Thereafter, the operation unit only for auxiliary register (ARAU) 194 executes the specified update processing for the one of the selected auxiliary registers (AR) 193 to calculate the relative address 709 to be used by the next instruction. The relative address 709 is used in the following five modes.
1. The auxiliary register (AR) 193 indicated by the auxiliary register pointer (ARP) 192 is used as the data memory address.
2. Access to data memory is made with content of auxiliary register (AR) 193 indicated by the auxiliary register pointer (ARP) 192 and thereafter "1" is subtracted from such content.
3. Access to the data memory is made with content of auxiliary register (AR) 193 indicated by the auxiliary register pointer (ARP) 192.
4. Access to data memory is made with content of auxiliary register (AR) 193 indicated by the auxiliary register pointer (ARP) 192 and thereafter content of auxiliary register (ARO) 193 is subtracted from such content.
5. Access to data memory is made with content of auxiliary register (AR) 193 indicated by the auxiliary register pointer (ARP) 192 and thereafter content of auxiliary register (ARO) 193 is added to such content.
Namely, the addressing using the auxiliary register (AR) 193 is roughly classified into following two kinds in the prior art and these are equivalent to the kinds of calculation of the operation unit only for auxiliary register (ARAU) 194.
A. Ordinary relative addressing by addition of "1"/subtraction of "1" for the auxiliary register (AR) 193
B. Relative addressing with index modification based on content of auxiliary register (ARO)
These address modes can be thought to be suited to the simplified data address among the unidimensional data addresses mainly for voice signal processing.
However, when addressing is carried out for the data on the bidimensional space, the address of leading reference point must be set to the address register and the pipeline of a series of calculations breaks, deteriorating the efficiency.
FIG. 22 shows data series on the bidimensional space and FIG. 23 is a flowchart of addressing operation for the data series of FIG. 22 by the address generator.
In the data series of FIG. 22, it is assumed that the bidimensional data of M data in the horizontal direction and L data in the vertical direction are stored in the data memory with the unidimensional address. It is also considered here that the reference points in the block in such bidimensional data (2m data in the horizontal direction, one--data in the vertical direction) are sampled in the direction of sequential horizontal scanning.
First, the address of the point P.sub.1 is initially set to the one auxiliary register AR1 (ARP=1) of the auxiliary registers (AR) 8 as the address register and "2" is set to AR0. For the reference points on the same horizontal line, addressing is carried out depending on the address control mode No. 8 shown in FIG. 23, namely EQU AR1.rarw.AR1+AR0.
However, the address of the leading reference point P.sub.m+1 of the next line cannot be generated by updating the value of AR1 and must be newly set. Accordingly, a step for calculating the leading address of line using the data calculator and setting it to AR1 is required. The subsequent processings are carried out in the same way as those for the first line. As can be seen in the above processing procedures, it is obvious that the intrinsical calculation processings are intermitted every time the line changes for the secondary data and thereby pipeline efficiency is lowered and processing period increases due to increase of a number of instruction steps.
Since the address control system of the prior art is constituted as explained above, in case it is used for bidimensional signal processing such as processing of video signals, it results in problems that address calculation must be often carried out in the data calculation section and program software of signal processing algorithm is complicated.