1. Technical Field
The present invention generally relates to data processing systems and in particular to clustered shared-memory multiprocessors. More particularly, the present invention relates to an efficient region coherence protocol for clustered shared-memory multiprocessor systems.
2. Description of the Related Art
To reduce global bandwidth requirements within a computer system, many modern shared-memory multiprocessor systems are clustered. The processors are divided into groups called symmetric multiprocessing nodes (SMP nodes), such that processors within the same SMP node may share a physical cabinet, a circuit board, a multi-chip module, or a chip, thereby enabling low-latency, high-bandwidth communication between processors in the same SMP node. Two-level cache coherence protocols exploit this clustering configuration to conserve global bandwidth by first broadcasting memory requests for a line of data from a processor to the local SMP node, and only sending memory requests to other SMP nodes if necessary (e.g., if it is determined from the responses to the first broadcast that the requested line is not cached on the local SMP node). While this type of two-level cache coherence protocol reduces the computer system global bandwidth requirements, memory requests that must eventually be broadcast to other SMP nodes are delayed by the checking of the local SMP node first for the requested line, causing the computer system to consume more SMP node bandwidth and power. It is important for performance, scalability, and power consumption to first send memory requests to the appropriate portion of the shared-memory computer system where the cached data is most likely to be found.
Coarse-Grain Coherence Tracking for Region Coherence Arrays may avoid unnecessary broadcasts of memory requests in broadcast-based, shared-memory multiprocessor systems. However, a key problem with Region Coherence Arrays is that in order to operate correctly, lines must occasionally be evicted from the processor's cache hierarchy. Region Coherence Arrays must maintain inclusion over a processor's cache hierarchy. Thus, when a region is evicted from the Region Coherence Array to make room for another region, the evicted region's lines must be evicted from the processor's cache hierarchy.
The eviction of cache lines for inclusion is very difficult to implement and detrimental to performance, reducing cache hit rates and offsetting the benefits of Region Coherence Arrays. Though Region Coherence Arrays may favor regions with no lines cached for replacement, the line eviction issue remains a problem. The problem worsens if the Region Coherence Array is scaled down in size. Thus, large Region Coherence Arrays are generally required.
An alternative implementation of Coarse-Grain Coherence Tracking is Region Scout Filters. Region Scout Filters consist of non-tagged, address-indexed hash tables of counts to track lines in the processor's cache hierarchy (Cached Region Hash/CRH), and separate, tagged arrays (Non-Shared Region Table/NSRT) for the addresses of non-shared regions recently touched by the processor. By using non-tagged hash tables of counts, Region Scout Filters are able to maintain inclusion over the cache hierarchy without having to evict lines, provided the counts are large enough to represent all cache lines that may hash to an entry. This benefit comes at the cost of precision. A count in the Cached Region Hash is the sum of all lines cached from all regions hashing to that entry, and a non-zero count means that the processor may cache a requested region. Once a processor brings a line into the cache hierarchy and increments the count in the corresponding CRH entry, all regions mapping to that CRH entry are considered “shared” by the rest of the system. Other processors must broadcast requests for lines in those regions. The smaller the CRH, the higher the percentage of non-zero counts, and the less effective is the Region Scout Filter. In practice, very large hash tables are required to make Region Scout Filters effective, as the Region Scout Filter is only effective if most of the counts are zero.