1. Field of the Invention
The present invention relates to a method and apparatus which implement a high-speed, high-reliability message transfer between processors in a multiprocessor system with a distributed shared memory and, more particularly, relates to a parallel multiprocessor system suitable for handling highly-frequent, relatively small-sized message communications and a message passing method using the same.
2. Description of the Prior Art
With a view to effectively improving the efficiency of software development, object-oriented programming techniques are now coming into widespread use in which the objects, each encapsulating a procedure and data, are executed by exchanging messages.
A description will be given, with reference to FIGS. 1 and 2, of a conventional multiprocessor system which executes object-oriented programs. For detailed information, see T. Shimizu et al., "Low-Latency Communication Support for the AP1000," Proceedings of the 19th International Symposium on Computer Architecture, pp. 288-297, 1992. In FIG. 1, reference numerals 1-1 and 1-2 denote processor modules (hereinafter referred to simply as PMs) in the multiprocessor system; 2-1 and 2-2 denote processors; 3-1 and 3-2 denote local memories accessible from the processors 2-1 and 2-2, respectively; 4-1 and 4-2 denote DMA (Direct Memory Access) controllers for message transfer between the PMs 1-1 and 1-2; 5 denotes an interprocessor communication network; 3B-1 denotes a message buffer area provided on the local memory 3-1; and 3B-2 denotes a message buffer area provided on the local memory 3-2. The local memories 3-1 and 3-2 have areas for storing objects to be executed by the processors 2-1 and 2-2 and areas for storing kernels which control and manage the execution of the processors 2-1 and 2-2.
Now, consider a message transfer from the PM 1-1 to the PM 1-2 in the system of FIG. 1. In response to a request (step S1 in FIG. 2) from a sender object (10 in FIG. 2) which is being executed by the processor 2-1, a kernel (11-1 in FIG. 2) which is a main program of an operating system reserves the message buffer area 3B-1 in the local memory 3-1 in step S2 and reports it to the sender object 10. Then, the sender object 10 writes a message in the message buffer area 3B-1 in step S3 in FIG. 2. This is indicated by the thick arrow L10 in FIG. 1. The sender object 10 issues a send request to the kernel 11-1 in step S4, and in step S5 the kernel 11-1 sets in the DMA controller 4-1 control information such as the base address (ADR1 in FIG. 1) of the message buffer area 3B-1, the message size (n) and the destination processor module number. This is indicated by the thick arrow L11 in FIG. 1.
In step S6 the kernel 11-1 activates the DMA controller 4-1, which reads the message from the message buffer area 3B-1 based on the control information set therein and transmits it to the DMA controller 4-2 in the receiver's side PM 1-2 via the interprocessor communication network 5 in steps S7 and S8. This is indicated by the thick arrow L12 in FIG. 1.
The DMA controller 4-2 generates an interrupt to the kernel (11-2 in FIG. 2) of the processor 2-2 in step S10 in FIG. 2 while it stores the message in a temporary storage in step S9. This is indicated by the thick arrow L13 in FIG. 1. In step S11 the kernel 11-2 reserves the message buffer area 3B 2 on the local memory 3-2 of the receiver's side PM 1-2 and in step S12 it sets in the DMA controller 4-2 control information such as the address (ADR2 in FIG. 1) of the message buffer area 3B-2 and the message size (n). Upon DMA initiation by the kernel 11-2 in step S13, the DMA controller 4-2 transfers the message to the message buffer area 3B-2 in step S14 (which is indicated by the thick arrow L14 in FIG. 1) and indicates the message transfer completion to the kernel 11-2 in step S15.
In step S16 the kernel 11-2 recognizes a receiver object 12 in FIG. 2 on the basis of the destination included in the message to activate the receiver object 12 in step S17. In step S18 the receiver object 12 reads out the message from the message buffer area 3B-2 in the local memory 3-2 (which is indicated by the thick arrow L15 in FIG. 1). In this way, the message passing is implemented between the PMs.
It is desirable that the message transfer to the receiver's side PM be initiated as soon as possible when the sender object 10 writes the message into the message buffer area 3B-1 in step S3 in FIG. 2. Also it is desirable that the receiver's side delivers the message to the receiver object as soon as possible when having received it in step S9 in FIG. 2. In the prior art, however, the message handling involves many kernel processes such as the preparation for the DMA initiation in steps S5 and S12, the allocation (or acquisition) of the message buffer area at the receiver's side in step S11 and the interrupt handling in steps S10 and S15--this inevitably results in high latency of message transfer and large overhead of kernel execution. Specifically, when the bandwidth of the interprocessor communication network 5 is high, the relative ratio of the kernel processing overhead increases relatively to the message transfer latency; hence, it is essential to reduce the processing overhead.
In steps S10 and S15 in FIG. 2 the prior art utilizes the interrupt as a means to report the arrival of the message from the sender's PM to the kernel. To acknowledge the interrupt request, however, it is necessary to save information of the object being currently executed and switch the context to interrupt handling. This requires large kernel execution overhead. Therefore, the interrupt scheme is not suited to a massively parallel processor system because it must handle a large number of interrupts per unit time, and therefore the processing overhead becomes very large and the transfer latency also increases.
To avoid the interrupt, there has been proposed a request accept scheme called polling. The polling system is one wherein processing requests are prestored in predetermined memories or registers and a processor reads them out afterward when the processor is ready to process them. With this scheme, the context switching overhead needed is smaller than in the case of using the interrupt scheme, but since it is necessary to read all the memories or registers which store processing requests, the situation occasionally arises where memories or registers with no processing requests stored therein are read in vain when processing requests are generated infrequently. In massively parallel processor systems, in particular, a large number of and a large variety of processing requests are generated, and consequently, a large overhead is needed to read out all of them at regular time intervals and the number of needless readouts increases accordingly, impairing the processing efficiency.
In U.S. Pat. No. 4,951,193 there is disclosed a method which transfers data and control information between processors for processing in parallel a plurality of tasks partitioned from a process in a DO loop in scientific computations. In U.S. Pat. No. 5,276,806 there is disclosed a technique by which when data is written to a certain location of a distributed shared memory of a certain processor node in a distributed shared memory system, the copy data is also written into the locations of distributed shared memory having the same address at all the other nodes. In Japanese Patent Application Laid-Open No. 19785/94 (filed in the name of J. Sandberg claiming priority based on U.S. patent application No. 859,087 dated Mar. 27, 1992) there is disclosed a system in which distributed shared memories, which form a shared virtual memory (SVM) address space, are shared by a plurality of nodes (work stations) through a network. In this system, each page of the shared virtual address space is mapped to data location memories in the system so that when data is written to a certain memory location, the same data is copied as well to the same address location of another distributed shared memory specified by an n-bit vector. In a publication by J. Sandberg et al., entitled "Virtual Memory Mapped Network Interface for the SHRIMP Multicomputer," Proceedings of the 21st International Symposium on Computer Architecture, pp. 142-152, 1994, there is set forth a system which is basically similar to that proposed in the above-mentioned Japanese patent application laid-open gazette but implements a network interface environment that minimizes the message passing overhead. Another publication by A. W. Wilson, Jr., entitled "Hardware Assist for Distributed Shared Memory," Proceedings Of 13th International Conference on Distributed Computing Systems, pp. 246-255, 1993 concerns a software distributed shared memory (SDSM) system, which reduces the frequency of page transfer, using an update-based coherency protocol that sends write data to the copy destination each time the shared data is updated. Still another publication by L. D. Wittie et al., entitled "Eager Sharing for Efficient Massive Parallelism," Proceedings of 21st International Conference on Parallel Processing pp. II-251-255, 1992, concerns a massively parallel processor system, in which each work station is provided with an interface to monitor a memory bus and when a data is written to a local memory, its address is compared with preset directory entries and if it is a shared variable memory area, a copy of the update data is sent to all the nodes in the multicasting group.
However, these prior art publications make no mention of a method which improves the efficiency of all the functions necessary for message passing on an all-inclusive basis, including the management of message buffers on the distributed shared memory and the detection of message send and receive requests.