1. Field of the Invention
The present invention relates to a microprocessor capable of realizing exclusive control and a data flow microprocessor having vector operation function making use thereof.
2. Description of the Related Art
Along with the progress of VLSI technology, the price of microprocessors has been lowered and the reliability has been enhanced, and it has come to be popular to process in parallel by coupling multiple microprocessors and processing data at high speed. However, when the processes divided into multiple microprocessors are processed quite independently, a certain inconvenience may occur. In particular, when shared resources represented by memory devices are used in plural processes simultaneously, matching of processing may not be often assured unless the process to be used simultaneously is limited to be one so as to execute exclusively.
To solve such problem, in the conventional microprocessor, for example, the processor resource has been managed by test and set instructions.
For example, on pages 222 and 223 of "User's Manual for Mitsubishi M32 Family MPU M32/100 M33210" published by Mitsubishi Denki Kabushikikaisya, BSETI instruction for realizing exclusive control is described. The function of BSETI instruction is to set bit with interlock, that is, the inverted value of a specified bit is copied in Z flag, and then that bit is set. At this time, these two operations are done by locking the bus.
A similar mention is found from pages 6-185 to 6-187 of "Series 32000 Programmer's Reference Manual" published by Prentice-Hall Inc., Englewood Cliffs, N.J. 07632, in which the content of the memory or register is copied in F flag of processor status register (PSR) by SBITI instruction, and the content is set to "1", and in this period the interlocked operation output pin of the CPU is under active state so as to interlock the access to the semaphore bit.
These prior arts are microprocessors belonging to the so-called complex instruction set computer type, and macro instructions such as BSETI and SBITI are realized by execution of microprogram of plural steps. In order to prevent suspension of processing by interruption from other processor or sacrifice of assurance of integrity of execution of instruction, the bus is locked or the interlock operation signal is made active.
In such methods, however, in the case of a general processor having a pipeline processing structure for processing the flow such as instruction fetch, data fetch, execution and storing of result, the number of running cycles at pipeline stage of instruction execution increases, and the stage following the pipeline stage becomes empty, and moving of data in the preceding stares is stopped, which may result in pipeline stalling or lowering of processing efficiency.
To solve such problems, the processor of so-called RISC (reduced instruction set computer) type has been proposed, in which one instruction is, as a rule, executed within one machine cycle, and commercial microprocessors on the basis of the RISC architecture are already on market.
The instruction set of Am29000 known as a typical RISC microprocessor is disclosed on pages MC1-303-151 to MC1-303-163 of "Nikkei Data Processor, Microprocessor" published by Nikkei BP. In the case of Am29000 which executes an instruction in one clock cycle, instructions requiring complicated steps in execution such as test and set are not supported. According to this published material, by making active the ninth bit of the exclusive register (the existing processor status register) that can be accessed only under the privilege mode called the supervisor mode, the lock pin of the processor is made active, and it is controlled not to open the bus despite of input of bus open request by BREQ signals from other processors. By this function, accordingly, the integrity of processing is guaranteed.
In the case of the conventional CISC processor, as mentioned herein, the problem is that the pipeline processing efficiency is lowered when a complicated instruction such as test and set is executed. To the contrary, in the case of RISC processor for executing one instruction in one clock cycle, the complicated instruction such as test and set cannot be realized in the hardware.
In the case of the Am29000, to realize exclusive control, first the supervisor mode is set, the bus is exclusively occupied by writing into exclusive register and making the lock signal pin active, the memory address representing the resource for exclusive control is read out, the read-out result is judged, in the case where the read-out result is, for example, "0", "1" is written into the same address, and then by writing again in the exclusive register to made the lock signal pin inactive, and the bus is opened to other processors and this procedure must be realized in the software. Thus, complicated procedure must be executed, the program running efficiency is very poor, and the bus is kept occupied in this period, and execution of other processes may be prevented.
An example of data flow computer introducing the vector operation mechanism is disclosed in the preprints for the 38th (first-half 1989) national meeting of Japan Society of Information Processing published in March 1989 under the title of "Outline of high parallel data flow type computer EDDEN."
This published paper points, as one of the problems of the data flow computer, that the performance is lowered in typical calculations of repeating simple processing on array or other typical structure and suggests that this problem can be solved by introducing the vector operation mechanism in the instruction executing unit, and executing the vector operation instruction locally on the array data stored external memory. Furthermore, it is also shown that the usual scalar data and vector operation control mechanism can enhance the filling rate of the operation pipeline by a method of sharing the arithmetic unit by time sharing. The data flow computer unveiled in this publication is composed as shown in FIG. 1. By reference to this diagram, the operation of the prior art is described below.
A one-chip data flow computer shown in FIG. 1 comprises network control unit NC, input control unit IC, queue unit Q, program storage unit PS, output control unit OC, firing control/color management unit FCCM, instruction executing unit EXE, and vector operation control unit VC.
FIG. 2 shows a simple example of a program (data flow graph) for explaining the practical operation of the known data flow computer, showing the processing of delivering the operation result of A+B as C. The data flow graph is composed of plural nodes allocated with node numbers, and arcs for showing the data depending relation among them. In the diagram, the pentagonal nodes are special nodes showing input and output with outside, and are not responsible for operation. On the other hand, the circular node performs the operation shown in the node on the input data.
Packet (data having tag information) A entered from outside through network control Unit NC is provided with #0 as the destination node number by the host computer. The other input packet B has the destination node number #1. These packets are temporarily stored in the queue unit Q via the input control unit IC, and the program memory is read out in the program storage unit PS by the respective destination node numbers as the input addresses, and the next destination node number, that is, #2, and the instruction code "+" corresponding to the node of #2 are read out. Afterwards, these packets reach the firing control/color management unit FCCM through the output control unit OC.
In the firing control/color management unit FCCM, since the destination node numbers of these packets are both as soon as the both packets A and B reach the firing control/color management unit FCCM, firing is processed, and an executing packet having two operands is generated, and is sent to the instruction executing unit EXE.
In the instruction executing unit EXE, the operation of A+B is done according to the instruction code stored in this packet, that is, "+" and a result packet C containing result data C is delivered. The result packet reaches the program storage unit PS via the input control unit IC and queue unit Q.
In the program storage unit PS, the program memory is read out with the destination node number possessed by the packet C as the input address, and the next destination node number #3 and instruction code "OUT" are read out. This packet having the instruction code "OUT" is branched to outside by the output control unit OC, and is sent into the network control unit NC in order to deliver outside the processor.
By the chain procedure of such processing, the operation corresponding to the data flow graph shown in FIG. 2 is performed, and the program execution is terminated. A simplest example is shown in FIG. 2, but the execution may be done exactly in the same manner even in the case of further complicated data flow graph composed of multiple instruction nodes and arcs showing the data depending relation among these instruction nodes.
During program execution, processing of nodes having data depending relation is executed sequentially, but processing of nodes having no data dependent relation can be execute parallel as far as permitted by the processing resource. The data dependent relation means, herein, such connective relation that the input data necessary for other processing is supplied only after completion of processing of one node in the connective relation between two nodes.
So far is described the flow of processing of scalar data, and in addition the method used for vector data processing of the instruction executing unit EXE by scalar data processing and time sharing is also disclosed in the same published paper. The vector operation control unit VC is responsible for execution and control of vector operation related instructions and ordinary memory access instructions. Among the vector operation control unit VC, input control unit IC and output control unit OC, bypass lines for structure (vector) communications are disposed. The external data memory is the data memory for storing the structure, etc.
However, the data flow computer having the conventional vector operation mechanism disclosed in the above publication involves two problems.
The first problem is that the through-put in vector operation execution is low because there is no data memory inside. For example, when adding vector X and vector Y and storing the result as vector Z in the data memory, the process of reading element xi of vector X and element yi of vector Y, and writing the result of operation in the data memory as element zi of vector Z must be repeated as many times as the number of elements in the vector. However, the one-chip data flow computer disclosed in the publication does not have data memory inside, and the inside of the chip possesses only the access control function to the external data memory, and vector operation of one element requires three times of memory access sequentially, and the access of data memory is the bottleneck in processing, and the high performance of the instruction executing unit EXE cannot be fully utilized by the vector operation alone.
The second problem is that the processing of the instruction executing unit EXE in the cyclic pipeline alternative of access of data memory or execution of instruction. Accordingly, when the data stored in the data memory is used in operation, it requires one cycle of cyclic pipeline processing for reading out data from the memory, and another cycle is needed for execution of operation, and the efficiency is poor.