The present invention relates to a parallel computer, and more particularly to a parallel computer having physically distributed sharable memories required when the number of element processors is large.
In a parallel computer, as the number of processor elements increases, respective memory modules are directly connected to a member of physically distributed processor elements, which are coupled by a network to form a loosely coupled parallel computer. In a tightly coupled parallel computer in which a number of processor elements access a memory at one location, performance is low because of competition of access. From a standpoint of software, however, it is advisable that the processor elements share a data structure at logically one location. Recently, therefore, memories which are physically distributed but logically shared (hereinafter called a sharable distributed memory) have been proposed. In JP-A-61-103258, a global addressing memory system is disclosed, in which data is distributed word by word on distributed memories which are shared by all processor elements. In the parallel computer of the U.S. application Ser. No. 85646 filed on Aug. 14, 1987, "now U.S. Pat. No. 4,951,193", a local addressing distributed memory system is disclosed in which data is distributed with overlapping of data segments.
In such a sharable distributed memory, if data to be defined or referred is not present in a memory module in a processor element of its own, a memory module in an other processor element must be accessed by an interconnection network. As a result, processing time is long. Accessing of data A in the memory module in the other processor element consists of the following three operation primitives (which are undecomposable operations).
(1) Reference PA0 (2) Definition PA0 (3) Recursive Definition PA0 1 The i-th processor element calculates the address of A(L(I)). PA0 2 A request is issued to the processor element (j-th processor element) connected to the memory module in which A(L(I)) is located to send data through the network. PA0 3 The j-th processor element renders that memory module to an exclusive control area. PA0 4 The j-th processor element reads the data and sends it to the i-th processor element through the network. PA0 5 The i-th processor element updates the transmitted data. PA0 6 The i-th processor element sends the updated A(L(I)) back to the j-th processor through the network and requests writing into the memory module of the j-th processor element. PA0 7 The j-th processor element stores A(L(I)) in the memory and releases the write exclusive control. PA0 (1) Along time is required for updating because the data is updated through the network (2 , 4, 6). PA0 (2) Since the memory module is exclusively accessed, the memory module of the j-th processor element cannot be used when the data is on the network or being processed in the i-th processor element, and the parallel processing is impeded (3 to 7).
R.rarw.A (Load A to a register of its own processor) PA1 A.rarw.R (Store a content of a register to the location of A) PA1 A.rarw.OP (A, R) (Return a result of operation of A with a content of a register to a location of A) or PA1 A.rarw.OP (A) (Return a result of processing of A to the location of A)
where R is a register of its own processor element, and OP is an operation.
The recursive definition may be decomposed to the reference and the definition. As will be described later, in the case of distributed processing, other operation primitives may be inserted between the reference and the definition so that the result may be changed. Accordingly, it is assumed here that the operation is executed as a single undecomposable primitive. Of the above three operation primitives, the operation primitive (1) may be processed efficiently by reciprocating a reference request message and a response message between processors having message communication functions, and the operation primitive (2) may be processed efficiently by transmitting a store request message. But, the operation primitive (3) needs complex exclusive control When a data is not in a memory module which is directly connected to the processor element requesting the data to be updated, it is necessary in the above proposed method for the processor element to issue a recursive operation instruction to transfer the data to its own memory through a network, update it, and send the result back to the memory module through the network. For example, if the i-th processor element is to execute a program EQU A(L(I))=A(L(I))+B(I)
it is not possible to always allocate A(L(I)) to the i-th processor element in a compiling phase because of the expression which determines an element address of the array A by the indirect index L(I). In such a case, there is no assurance that the program and the data are allocated to the same processor element, and normally other memory modules must be read out. In order to inhibit other processor elements from accessing that data, that memory module must be exclusively accessed. In the above program, the exclusive execution procedure is as follows.
Followings are points to be resolved.
The disclosure of the Japanese patent application mentioned above is included in the disclosure of the present application by reference and it is not intended as prior art.