1. Technical Field
The present invention relates in general to the field of computers, and in particular to shared-memory multiprocessors. More particularly, the present invention relates to a Region Coherence Array (RCA) for shared-memory multiprocessor systems having subregions and subregion prefetching.
2. Description of the Related Art
Processors in a single-level, or a multi-level interconnect hierarchy architecture are typically divided into groups called symmetric multiprocessing nodes (SMP nodes), such that processors within the same SMP node may share a physical cabinet, a printed circuit board, a multi-chip module, or an integrated circuit chip, thereby enabling low-latency, high-bandwidth communication between processors in the same SMP node.
Coarse-Grain Coherence Tracking (CGCT) using Region Coherence Arrays (RCAs) in shared-memory multiprocessor systems is a technique that has potential to optimize bandwidth, power consumption, and latency in shared-memory multiprocessor systems by identifying which level(s) of the interconnect hierarchy to which to send memory requests for a line of data, and sending memory requests only to those identified interconnect level(s), if any. If applicable, other levels of the interconnect hierarchy can be skipped without sending a memory request, thereby reducing memory request traffic and power-consumption at those interconnect levels, and avoiding the latency of checking those interconnect levels for a line of data stored among the processors in the system.
There are, however, two main perfomance-limiting aspects of CGCT using RCAs, which are reach and precision. That is, an RCA is limited by how much data it can map (reach), and how precisely it tracks the coherence status of lines not cached by a processor associated with the RCA (precision). To exploit more spatial locality and temporal locality, RCAs need to have more reach, and hence must utilize large region sizes. However, large region sizes can result in more false sharing of regions and less precise tracking of coherence status, such that sometimes lines in a region are shared when other lines in the region are not shared (region false sharing). Hence, to improve the performance of CGCT using RCAs, there is a need for RCAs which use large region sizes while tracking the coherence status of lines in the regions with increased precision.
Due to the fixed physical page size of the memory architecture, the region size has an effective upper bound. The operating system is not required to place related pages together in the system memory, thus increasing the region size beyond the physical page size is likely to be less effective, not more effective. Thus, the smallest physical page size supported by the system architecture enables the most reach and the most efficient utilization of data storage, and therefore is a practical choice. However, a smaller region size would increase precision and reduce false sharing, at the cost of spatial locality, temporal locality, and space efficiency. Therefore, what is needed to improve effectiveness in shared-memory multiprocessor systems is more precise tracking of coherence status while using a large region size.