1. Field of the Invention
The present invention relates generally to a Distributed Shared Memory (DSM) system, and more particularly, to a communication protocol to transfer/receive data between respective nodes.
Furthermore, the present invention relates to a new Adaptive Granularity type communication protocol to integrate fine- and coarse communication in a distributed shared memory which makes it possible to actively process depending on the communication data size between respective nodes. According to the present invention, it is possible to obtain standard load-store communication performance by employing cache line transfer for fine-grain sharing and bulk transfer performance by supporting spectrum granularity for bulk transfer.
Generally speaking, Distributed Shared Memory system is a noteworthy system as a recent multiprocessor system owing to its large expandability and programmability. In addition, most hardware distributed shared memory machines (DSM) achieve high performance by allowing the caching of shared writable data as well as read-only data.
The descriptions of the general concept for the above Distributed Shared Memory system are as follows, with reference to accompanied FIG. 1.
Distributed Shared Memory system is a multiprocessor computer system in which respective node can refer to memory of other nodes (i.e., distributed memory) as if it were its own memory. This architecture is a cache coherence management basis DSM system. Hence, one node in cache coherence management basis DSM system can refer to the memory of other nodes. It means that DSM system makes it possible to obtain good performance by storing the block referred to the memory of remote nodes in its cache and referring to the data in its cache without direct referring to the memory of remote nodes when it is necessary to refer to the above certain block.
However, if the corresponding cache line of certain node are modified by the certain node when the respective nodes share the memory block of home node, then the nodes with unchanged data are forced to refer to the old, unchanged data. Therefore, implementing cache in respective node introduces problem.
To solve the above problem, many cache coherence protocols are implemented. With reference to FIG. 1, respective nodes (NA, NB, NC, . . . , NK) are tightly connected to each other by Interconnect Network: IC Net) (This network is developed to speed up the message transfer.). In this case, k-array n-cube network, wormhole route network are included in the preferred example for the above IC Net.
The inner structure of the respective nodes is similar to those of prior uniprocessor (e.g., personal computer, Sparc20, and so on) as referred to by NK in FIG. 1, but respective node has node controller and directory to support DSM.
When read/write miss for cache happens in one node, the node controller transfers request to home node for cache coherence management. If reply for the request is transferred from home node, then the controller handles the corresponding protocols.
In addition, directory must have the information of its own memory sharing state in order to keep the cache coherence. Of course, the number of directory entries must be large enough to meet the number of respective blocks one by one, and the above entry stores the number of node sharing this memory block. If other node tries to change the memory block of home node, then the other node must acquire write approval from the home node and the home node transfers invalidation requests to all nodes sharing the corresponding block before write approval transfer. If the home node has received all acknowledge messages, then the home node transfers write approval to write requesting node.
The above briefly described protocol is a Hardware DSM Protocol (HDP), and a transfer/receive protocol for fine-grain.
Therefore, one disadvantage of this kind of cache-coherent machine is that they use a fixed size block (i.e., a cache line for loads and stores) as a way of a communication. While this works well for fine-grain data, on some application, another communication program the characteristics of which is parallelism can sometimes be more effective than caching permitting for data bulk transfer.
Hence, to solve the described problems, a method supporting above two types simultaneously, i.e., the method for using all advantages of fine- and coarse-grain communications is supposed recently and the brief descriptions of it are as follows.
More recent shared memory machines have begun to integrate both models within a single architecture and to implement coherence protocols in software rather than in hardware. In order to use the bulk transfer facility on these machines, several approaches have been proposed such as explicit messages and new programming models. In explicit message passing communication primitives such as send-receive or memory-copy are used selectively to communicate coarse-grain data, while load-store communication is used for fine-grain data. In other words, two communication paradigms coexist in the program and it is the user""s responsibility to select the appropriate model.
Though these two approaches support an arbitrarily variable granularity and thus may potentially lead to large performance gains, they suffer from decreased programmability and increased hardware complexity. In other words, there is a tradeoff between the support of arbitrary size granularities and programmability.
The primary objects of the present invention to solve the prior problems is to provide a new Adaptive Granularity type communication protocol to integrate fine and coarse communication in a distributed shared memory which makes it possible to actively process on communication protocol setting depending on the communication data size between respective nodes. According to the present invention, it is possible to obtain standard load-store communication performance by employing cache line transfer for fine-grain sharing and bulk transfer performance by supporting spectrum granularity for bulk transfer. In addition, by efficiently supporting the transparent bulk data communication, it is possible to reduce the programmer""s burden for using variable-size granularity.
The present invention is characterized as a data communication method for reading/writing data between memories in a distributed shared memory system wherein said protocol selectively performs bulk transfer by supporting spectrum granularity for coarse-grain sharing or standard load-store transfer by employing cache line transfer for fine-grain sharing.
In accordance with the aspects of the present invention, a bulk data communication method of a data communication method for reading/writing data between memories in a distributed shared memory system is provided which comprises the steps of a) determining only the communication type without designating the requested data size according to the data type and transferring the request to the home node, when the node controller is instructed from local cache; b) determining the granularity depending on the sharing pattern and transferring the bulk data to the requesting node, when the home node receives the bulk request; c) adding the two blocks into one buddy when two adjacent blocks are owned to same node; and d) writing by the node controller only requested data in the cache line and the rests in local memory in order to use for future cache miss, when the data arrive.
Another object of the present invention is to provide a bulk data communication method which, after the step b), further comprises the step of dividing by the home node the block into two parts in order to reduce the false sharing when the ownership of the block is changed.
Another object of the present invention is to provide a data communication method from local or remote nodes to home node for reading/writing data between memories in a distributed shared memory system in which a plurality of nodes are connected on interconnection network, and said respective node comprises processors with certain functions, memories, caches and node controllers for communicating data between said node and another nodes, which comprises the states of
INVALID state; READ TRANSIT state; WRITE TRANSIT state; READ ONLY state; and READ WRITE
Wherein, the state transits to xe2x80x9cINVALIDxe2x80x9d state when the cache line does not have the data to be referred (read/written) by processor, wherein a cache line fault handler requests bulk data transfer or fine data transfer to another node (remote computer or remote node),
the state transits to xe2x80x9cREAD TRANSITxe2x80x9d state when the cache has not the corresponding data block while the processor tries to read certain memory block, wherein the node cache controller sends Read Data Request (RREQ) to the home node and then the state transits to xe2x80x9cREAD TRANSITxe2x80x9d state; the procedure is blocked in xe2x80x9cREAD TRANSITxe2x80x9d state until the data for corresponding block are transferred; and the state of the cache line transits to xe2x80x9cREAD ONLYxe2x80x9d state when the home node has transferred the requested data and the data are loaded to the fault cache line,
the state transits to xe2x80x9cWRITE TRANSITxe2x80x9d state when the cache has not the corresponding data block while the processor tries to write certain memory block, wherein the node cache controller sends Write Data Request (WREQ) to the home node and then the state transits to xe2x80x9cWRITE TRANSITxe2x80x9d state; the procedure is blocked in xe2x80x9cWRITE TRANSITxe2x80x9d state until the data for corresponding block are transferred; and the state of the cache line transits to xe2x80x9cREAD WRITExe2x80x9d state when the home node has transferred the requested data and the data are loaded to the fault cache line,
the state transits to xe2x80x9cREAD ONLYxe2x80x9d state when the home node has transferred the corresponding block in xe2x80x9cREAD TRANSITxe2x80x9d state, wherein if the state of corresponding cache line is xe2x80x9cREAD ONLYxe2x80x9d state, it means that the memory data of home node are shared; and if another node attempts to write the same memory data of the home node and the home requests INV request, then the state of corresponding line returns to xe2x80x9cINVALIDxe2x80x9d state, and
the state transits to xe2x80x9cREAD WRITExe2x80x9d state when the home node has transferred the corresponding block and write approval and then the data have been stored in the fault cache line in said xe2x80x9cWRITE TRANSITxe2x80x9d state, wherein if the state of one cache line is xe2x80x9cREAD WRITExe2x80x9d state, then this node has ownership for the memory data of home node; if another node requests to write this memory data of home node, then the home node sends INV request (invalidate request) to the node having the ownership and being in xe2x80x9cREAD WRITExe2x80x9d state; and the INVALID requested node returns the state of the corresponding cache line to xe2x80x9cINVALIDxe2x80x9d state.
Another object of the present invention is to provide a data communication method from home node to local or remote nodes for reading/writing data between memories in a distributed shared memory system in which a plurality of nodes are connected on interconnection network, and said respective node comprises processors with certain functions, memories, caches and node controllers for communicating data between said node and another nodes, which comprises the states of
EMPTY state; READ ONLY state; READ WRITE; READ TRANSIT state; and WRITE TRANSIT state,
wherein
the state transits to xe2x80x9cEMPTYxe2x80x9d state when read/write request is sent to the home node, wherein said xe2x80x9cEMPTYxe2x80x9d state means the protocol state of directory when the page for requested data is referred; the home node refers to the directory having the corresponding data storing memory information, determines whether the corresponding data are bulk transfer data or fine transfer data depending on the information, sends first the requested cache line to the requesting node, and then transfers the rest data of the page as a sequence of cache lines,
the state transits to xe2x80x9cREAD ONLYxe2x80x9d state after transferring the corresponding requested data, wherein the requested specific cache line size data are sent first to the node requesting said specific data in home node and the rest contents of the page including the requested specific data are transferred as a sequence of cache lines; if the RREQ for the same block is sent to the home node, then the directory for the requested data is referred and the corresponding cache line size block is transferred to the requester; and if the above requesting is for bulk data transfer, then the whole page including the block is sent to the requester,
the state transits to the xe2x80x9cREAD WRITExe2x80x9d state when WREQ is transferred from another node in xe2x80x9cREAD ONLYxe2x80x9d state, wherein, when the request to the home node in bulk data requesting, write data request WREQ requests the change of the data ownership; the home node transfers the half of the page including this data to the requester, so not only the requester has the ownership of the transferred half page, but also the data except for the requested cache line are used for future; INV (invalidate request) is sent to another node having the half of transferred block in order to maintain the cache coherency; and the state of home node transits from xe2x80x9cREAD WRITExe2x80x9d state to xe2x80x9cEMPTYxe2x80x9d state,
the state transits to xe2x80x9cREAD TRANSITxe2x80x9d state when the home node waiting ACKs receives the RREQ for the same block from another node, wherein the home node waits until it receives all ACKs; if the home node receives all ACKs for the invalidate requests, then the state transits to xe2x80x9cREAD WRITExe2x80x9d state; and the home node handles the RREQ/WREQ received in xe2x80x9cREAD TRANSITxe2x80x9d state, and
the state transits to xe2x80x9cWRITE TRANSITxe2x80x9d state when the home node waiting ACKs receives the WREQ, wherein if the home node receives all ACKs for the invalidate requests, then the home node again handles the RREQ/WREQ received while it waits ACKs.