The invention relates to computer processors and memory systems. More particularly, the invention relates to optimizing coherent memory access operations within multiprocessor computer systems having distributed shared memory architectures.
Multiprocessor, or parallel processing, computer systems rely on a plurality of microprocessors to handle computing tasks in parallel to reduce overall execution time. One common implementation of a multiprocessor system is the xe2x80x9csingle bus architecture, in which a plurality of processors are interconnected through a single bus. However, because of the limited bandwidth of the single bus also limits the number of processors that can be interconnected thereto, recently a networked multiprocessor systems have also been developed, which utilize processors or groups of processors connected to one another across an interconnection fabric, e.g., a network, and communicating via xe2x80x9cpacketsxe2x80x9d or messages.
Typically, in a networked multiprocessor system includes a plurality of nodes or clusters interconnected via a network. For example, FIG. 1 shows an exemplary networked multiprocessor system 100, in which a plurality of nodes 102 are interconnected to each other via the interconnection fabric 101, e.g., a network. By way of an example, only two nodes are shown. However, the networked multiprocessor system 100 may have any number of nodes. Moreover, although, in FIG. 1, the interconnection fabric 101 is shown to provide interconnections only between the nodes 102, all system entities, including the cells 103, the processors 105 and the memories 104, are interconnected, and communicate, with the rest of the system through the interconnection fabric 101.
Each of the nodes 102 of the networked multiprocessor system 100 may be further divided into a smaller hierarchical unitsxe2x80x94referred herein as xe2x80x9ccellsxe2x80x9d 103xe2x80x94, which comprises a plurality of processors 105 and a shared memory 104. Each processor 105 may comprise any processing elements that may share data within the distributed shared memory in the system, e.g., a microprocessor, an I/O device or the like. The grouping into nodes and/or cells of the system entities may be made physically and/or logically.
Each of the shared memory 104 may comprise a portion of the shared memory for the system 100, and may include a memory controller (not shown) and/or a coherency controller (not shown) to control memory accesses thereto from various processors in the system, and to monitor the status of local copies of the memory stored in caches of various processors in the system.
In a networked multiprocessor system such as one described above, multiple copies of a piece of data from the shared memory may be stored in the caches of various processors. Each processor that has a copy of the data in its cache is said to xe2x80x9csharexe2x80x9d the data-the data is often referred to as one or more xe2x80x9ccache linesxe2x80x9d. In order to maintain a proper operation of the networked multiprocessor system, it is critical to ensure that all copies of any shared data must be identical to the data in the shared memory, e.g., a coherency between the copies and the data in the memory must be ensured.
Prior attempts to address the above coherency problem is to broadcast an xe2x80x9cinvalidatexe2x80x9d signal, whenever the shared memory location is updated, to every entity, e.g., processors 105, in the system 100 that may potentially share the memory location so that each of the entity may xe2x80x9cinvalidatexe2x80x9d the copy in its cache, and the data would be obtained from the memory rather than the entity""s cache in a subsequent access.
Unfortunately, however, the broadcasting of invalidate signal to all potential sharers, e.g., all processors 105, is wasteful of the system bandwidth since as many invalidate messages as there are processors in the system must be sent, and the resulting invalidate response messages from each of the processors, across the system interconnect fabric 101 and/or the data paths connecting each processors to the system. This lowers the system performance.
Prior attempts were made to address the above waste of system bandwidth by restricting the sharing of memory to within one of the nodes 102 at a time. This approach is inefficient and inflexible in that if a new sharer from a different node was to be added to the list of sharers, the other sharers in the list must be invalidated first before the new sharer can be added. This tend to increase the invalidate message traffic, and thus has negative system performance implications, particularly for cache lines that should preferably be shared as read-only by all processors.
Moreover, in a distributed shared memory system, an address aliasing error, e.g., error during a translation from physical address to a virtual address, may result in a duplicate copy of a cache line, addresses of the copies being different from each other. This may disturb the data coherency of the system, and eventually cause data corruptions, which often result in a fatal system crash.
Prior attempts to address this aliasing error problem includes running a large test suit under system software and looking for signs of data corruption. Unfortunately, however, this prior solution is an after-the-fact approach that can only detect data corruption, i.e., after a data corruption has already happened.
Thus, there is a need for more efficient method and device for providing tracking of the system entities that may share a cache line to maintain data coherency in a multiprocessor system, which avoids sending coherency messages to all entities in the multiprocessor system.
There is also a need for more efficient method and device for detecting an address aliasing error before a corruption of data occurs.
In accordance with the principles of the present invention, a method of data sharing in a distributed computing system having a plurality of processing elements and at least one memory having stored therein a plurality of cache lines comprises providing a plurality of shared masks, each of the plurality shared masks corresponding to an associated one of the plurality of cache lines in the at least one memory, and each of the plurality of shared masks having a plurality of bits, each of the plurality of bits indicating whether at least one of the plurality of processing elements may be sharing the associated one of the plurality of cache lines, and wherein the number of the plurality of bits is less than the number of the plurality of processing elements.
In addition, in accordance with the principles of the present invention, an apparatus for data sharing in a distributed computing system having a plurality of processing elements and at least one memory having stored therein a plurality of cache lines comprises a plurality of shared masks, each of the plurality shared masks corresponding to an associated one of the plurality of cache lines in the at least one memory, and each of the plurality of shared masks having a plurality of bits, each of the plurality of bits being associated with one or more of the plurality of processing elements, and each of the plurality of bits indicating whether respective associated one or more of the plurality of processing elements may have a copy of the associated one of the plurality of cache lines, and wherein a number of the plurality of bits is less than a number of the plurality of processing elements.
In accordance with another aspect of the principles of the present invention, a method of detecting an address aliasing error in a computing system having at least one memory having stored therein at least one cache line comprises providing a directory tag alias signature for each of the at least one cache line, the directory tag alias signature having encoded therein a signature of an address information of the at least one cache line, detecting a request to access the at least one cache line, the request including a requested address information of the at least one cache line, computing a computed alias signature based on the requested address information, comparing the directory tag alias signature with the computed alias signature to determine if the directory tag alias signature and the computed alias signature match each other, and indicating an occurrence of the address aliasing error if the directory tag alias signature and the computed alias signature do not match each other.