1. Field of the Invention
The invention generally relates to a processor, and particularly to a processor with a preceding operation circuit connected to the output of a data register. The invention has particular applicability to a microprocessor performing butterfly operation at high speed.
2. Description of the Background Art
As computer systems and microcomputers etc. have been increasingly used in various fields of society, demand has increased for high speed operational processing using the same. Generally, operations in a computer system or a microcomputer are carried out by a processor or a microprocessor which is a logic integrated circuit. Therefore, various efforts have been made in order to make these processors operate faster.
FIG. 20 is a block diagram showing a conventional microprocessor. The kind of processor shown in FIG. 20 is observed, for example in an article by Nakagawa et al titled "A 50ns Video Signal Processor" (ISSCC89, Digest of Technical Papers pp. 168-169). In this article this microprocessor is described as a digital signal processor (DSP).
Referring to FIG. 20, this microprocessor comprises a data operating part 1 for carrying out various operations, a bus line 2 for transmitting data, a data memory part 24 for storing data, an instruction memory part 23 for storing instruction programs to carry out operations, a program sequence control part 22 for receiving externally applied control signals and decoding instruction programs, an address operating part 8 for operating addresses, and an interface part 21 for externally inputting and outputting data. Data operating part 1 comprises an arithmetic logic unit 3 (referred to as ALU hereinafter) to carry out logic operations of data applied through bus line 2, a multiplier 5 to multiply data applied through bus line 2, and a register part 4 to hold output data from ALU 3 and multiplier 5 temporarily.
In operation, program sequence control part 22 decodes an instruction program stored in instruction memory part 23, and applies controlling signals S10 and S20 to data operating part 1, address operating part 8, data memory a part 24 and interface part 21. Address operating part 8 obtains by operations the source address of data to be processed in data operating part 1 as well as the destination address of the processed data in response to the controlling signal S10. The source address and the destination address output from address operating part 8 are transmitted to each part through bus line 2. Data memory part 24 supplies data designated by an address output from address operating part 8 to data operating part 1 through bus line 2. In data operating part 1, ALU 3 and multiplier 5 carry out operation of the applied data, and apply the result of the operation to register part 4. Register part 4 temporarily holds the applied data, and outputs the held data to bus line 2 in response to source designation signals S1 to Sn applied through bus line 2 from address operating part 8. The data applied to bus line 2 is transmitted through bus line 2 to a part designated by address operating part 8, e.g. data memory part 24 or interface part 21. After the processed data is applied to interface part 21 through bus line 2, the data is stored in, for example, an externally provided external storage device.
FIG. 21 is a schematic block diagram of register part 4 indicated in FIG. 20. Referring to FIG. 21, register part 4 comprises n registers R1 to Rn. The respective registers R1 to Rn, as indicated in FIG. 20, are assumed to be connected to receive processed data from ALU 3 and multiplier 5 (not shown). In addition, the respective registers R1 to Rn are connected to receive the source designation signals S1 to Sn respectively from address operating part 8 through bus line 2 (not shown). Data memory part 24 is connected to receive address signals AD from address operating part 8. It is assumed that data a1 to a8 to be operated are stored in each of the addresses M1 to M8 in data memory part 24.
In operation, for example, data b1 operated in ALU 3 or multiplier 5 is held in register R1. Address operating part 8 outputs the source designation signal S1. Register R1 is responsive to the signal S1 to supply the data b1 held therein to data bus line 2. Likewise, data memory part 24 supplies the data a1 designated by the address signal AD output from address operating part 8. The data supplied to data bus 2 is then applied to data operating part 1 thereby continuing the operations.
Generally, in a microprocessor, an addressing method (addressing) is used to designate the storing locations of source data. As addressing methods, direct addressing, indirect addressing, relative addressing immediate addressing, offset addressing and indexed addressing are known.
For example, according to the direct addressing, the data of address defined in the operand part of an instruction is designated as source data. In case of the indirect addressing, the storing location of source data is written in a register or a data memory designated in the operand part of an instruction and the source data is taken out therefrom. According to the relative addressing, for example, the value of the program counter in which the address of the instruction presently under execution is held, with some value added, is to be the value of the address of the source data. According to the immediate addressing data, the source data is directly written into the operand part of an instruction. According to the offset addressing or the indexed addressing, the value of the address in which source data is stored is "qualified". Generally, these address methods are widely known by other articles.
The above mentioned address methods are the methods to select either of the designation of the register in which source data is stored, the designation of the address of the data memory in which source data is stored, and the use of the immediate data directly written in the operand part of an instruction. In other words, the data treated as source data in data operating part 1, is either the immediate data written in the operand part of the instruction or the data held in the register or the data stored in the data memory.
FIG. 23 is a diagram of operational signs indicative of butterfly operation in discrete Fourier transform by frequency domain dividing method. As understood from FIG. 23, butterfly operation allows output data X and Y represented by the following equations to be obtained from two input data a and b. EQU X=a+b (1) EQU Y=(a-b).times.W.sub.N ( 2)
where, W.sub.N is a coefficient called "twist factor".
In fast Fourier transform (FFT), operational stages including N/2 butterfly operations (N is the number of sampling and the power of 2) are connected in series over log.sub.2 N stages. Also in the foregoing DCT algorithm, the butterfly operations of log.sub.2 N stages are performed.
As understood from FIG. 23 and the equations (1) and (2), it is pointed out a that in order to carry out butterfly operation, addition, subtraction and multiplication are needed to be performed once respectively.
On the other hand, discrete cosine transform (referred to as DCT hereinafter) which is known as a kind of orthogonal transform, generally has a good transform characteristic of video data having strong correlation to each other and, therefore, it is used for compression of video data. As another reason why DCT is used for compression of video data, it is pointed out that there exists a fast algorithm. Many fast algorithms have been presented until today and one example is in an article by W. H. Chen et al, titled "A Fast Computational Algorithm for the Discrete Cosine Transform" (IEEE Transactions on Communications VOL. COM. 25, No. 9, September, 1977). In most fast algorithms, basically butterfly operation indicated in FIG. 23 is included.
In accordance with the foregoing article by Chen et al, the transformation equation of discrete function with the sampling values of N point f (j), where j=0, 1, . . . , N-1 is described as follows; ##EQU1##
Accordingly, in case the discrete function f (j) having a sampling value of four points the following equation (6) is obtained by substituting N=4 in equation (3). ##EQU2##
In a conventional microprocessor, the designation of source data is performed as mentioned above and the time required for the processing is mainly determined by the number of necessary operations and the operating speed in data operating part 1. That is, the operating speed of the microprocessor is restricted by the processing speed in data operating part 1.
In order to realize faster operational processing, various methods are employed, such as a method to shorten the period of one instruction cycle (to increase the clock frequency) and a method to carry out a sophisticated operation in response to one instruction (for example multiplication, division and rational function operation etc.) by providing arithmetic units in parallel like a floating point arithmetic apparatus. However, according to the former method, the fast operation is impaired by problems related to semiconductor manufacturing technique and limitation in performance of the peripheral devices. Meanwhile, the latter method is not desirable because a large number of devices are required, for carrying out operations and in addition to that, it is considerably complicated to control the input and the output. As is often the case with numeric operation, simple operations such as addition and subtraction are repeated many times. In such a case, the latter method making use of the floating point arithmetic apparatus is not suitable because the circuits to carry out complicated operations come to be wasted.
Next, as a simple example of numeric operation, description will be given on the case in which summing of eight data a1 to a8 is carried out. EQU X=a1+a2,+a3+. . . +a8 (7)
The respective data a1 to a8 are supposed to be stored in addresses M1 to M8 in data memory part 24, respectively, and data X indicative of the result is assumed to be stored in register R7.
FIG. 22 is an operational flow chart showing how summing of the eight data represented by equation (7) is carried out using register part 4 indicated in FIG. 21. In this figure, for example a1 (M1)+a2 (M2).fwdarw.b1 (R1) indicates that data a1 stored in address M1 and data a2 stored in address M2 in data memory part 24 are added up and the added data b1 is stored in register R1.
In the arithmetic operation, as indicated in FIG. 22, addition is carried out over 7 operational steps, and the added data X is obtained in register R7. Accordingly, in order to carry out addition of eight data a1 to a8 using register parts 4 with the circuit configuration indicated in FIG. 21, the operation time corresponding to seven instruction cycles is required thereby preventing the operation time from being shortened.
Now, description will be given on the time required for operation in case of 4-point discrete cosine transform when register 4 indicated in FIG. 21 is used.
FIG. 24 is an operational flow chart of the case in which butterfly operation is carried out using register part 4 indicted in FIG. 21. In this figure, for example, x0 (R1)+x3 (R4).fwdarw.a1 (R6) in step 1 indicates that data x0 held in register R1 and data x3 held in register R4 are added up and then the added data a1 is stored in register R6. Here, the respective input data x0 to x3 are supposed to be held in advance in registers R1 to R4, respectively.
In the arithmetic operation, as indicated in FIG. 24, operations are carried out over 14 arithmetic steps altogether. The operations include addition, subtraction and multiplication, and these operations are performed in ALU 3 or multiplier 5 in data operating part 1 indicated in FIG. 20. After the operations over 14 steps are carried out, the respective output data z0 to z3 indicative of the results of the operations are held in registers R5 to R8, respectively. Consequently, in order to carry out the butterfly operations using register part 4 with the circuit configuration indicated in FIG. 21, it is pointed out that the operation time corresponding to the total 14 instruction cycles is required, thus preventing the operation time from being reduced.