The present invention relates to a parallel processor system with distributed memories. More particularly, the present invention relates to a message passing parallel processor method and apparatus that realizes coherence control on main memory locations storing data to be sent and received data to ensure that the contents of a cache memory and main memory are identical when data is transferred among processor elements that form the message passing parallel processor system.
Demands on computers for faster processing performance have prompted the advent of a parallel processor system using a plurality of instruction processors in combination. The parallel processor system designed for applications in so-called supercomputing fields, such as science and technology computing, has hundreds to thousands of processors interconnected via an inter-processor network. The parallel processor system of this configuration adopts a memory system in which instruction processors are each provided with an independent main memory with all these main memories systematically controlled. Such a memory system is generally called a distributed memory system.
In the distributed memory type parallel processor systems, when an instruction processor is to reference data on a main memory controlled by other instruction processors, the reference is achieved by data transfer between the instruction processors via an inter-processor network. The distributed memory parallel processor systems are classed into two groups, a distributed shared memory type and a message passing type, according to how this data transfer is realized.
In the distributed shared memory type parallel processor systems, the execution of load/store instructions (machine language instructions) involving destination addresses of main memory locations in other instruction processors automatically invokes data transfer. The amount of data transferred at this time is approximately a word handled by the load/store instruction. The amount may be several tens of words depending on the system. In the message passing type parallel computer, data transfer is realized by explicitly activating a data transfer mechanism by a program procedure. The amount of data transferred at this time, though limited by hardware, is arbitrary within this limit and a large amount of data, more than several kilo-words, can be transferred.
The distributed shared memory type and the message passing type differ in the data transfer starting mechanism and the amount of data transferred, as described above. The difference in the control mechanism becomes even wider when the instruction processors have a cache memory. In the distributed shared memory type parallel processor system, a coherence control on the cache memory and the main memory is automatically performed by hardware during the course of data transfer, as with bus-connected multiprocessors (symmetric multiprocessors: SMP). For example, a distributed shared memory type parallel processor system, named Dash, developed at Stanford University, as disclosed in "The Stanford Dash Multiprocessor", by D. Lenoshi, et al., IEEE Computer, March 1992, pp 63-79, realizes coherence control by adopting a directory method. Realizing such a control mechanism, however, is disadvantageous in terms of hardware cost. In the message passing type parallel processor system, the coherence control accompanying data transfer is not performed by hardware but instead explicitly performed by software. To describe in more detail, the data area to be transferred is erased from the cache memory before the data transfer by using a flush instruction or purge instruction in order to prevent disagreement in contents between the cache memory and main memory. The flush instruction, when expected and when data on the cache memory differs from that of the main memory, causes the system to copy back the cache memory content to the main memory and then erases the data location from the cache memory. The purge instruction when executed causes the system to erase a data location from the cache memory.
As described above, in the message passing type parallel processor system, the cache memory needs to be subjected to the explicit flush or purge processing by software before data transfer (send or receive operation) to prevent disagreement in contents between the cache memory and the main memory at time of data transfer. When considered in terms of data transfer performance, such software processing itself is a huge overhead (performance degradation factor) and the fact that the cache memory, which should normally be handled by hardware, needs to be recognized and controlled by software at time of data transfer constitutes a large limiting factor for a program or algorithm as a whole. These combine to lower the processing efficiency of the system.