1. Field of the Invention
The present invention relates to the design of shared-memory multiprocessor systems. More specifically, the present invention relates to a method and an apparatus that reduces coherence traffic in a shared-memory multiprocessor system by supporting both coherent and non-coherent memory accesses.
2. Related Art
In shared-memory multiprocessor systems, cache coherence problems can arise if multiple copies of the same data item exist in local caches attached to different processors. If this is the case, modifying a first copy of the data item in a first local cache will cause the first copy to be different from a second copy of the same data item in a second local cache. Hence, the first and second copies of the data item will not be “coherent.”
To prevent the above-described coherence problem, multiprocessor systems often provide a cache-coherence mechanism, which uses a specific cache-coherence protocol, and operates on a system bus that interconnects the coherent caches and a system memory. The cache-coherence protocol ensures that if one copy of a data item is modified in a local cache, other copies of the same data item in other caches (and possibly in the system memory) are updated or invalidated to reflect the modification. The associated messages generated on the system bus by the coherence protocol are typically referred to as “coherence traffic.”
As multiprocessor systems begin to include larger number of processors, coherence traffic is becoming progressively heavier and is consuming more system bus bandwidth.
However, some of this coherence traffic is unnecessary. For example, if a data item in a local cache does not have any copies in other caches, there is no need to send an invalidation message to other caches when the data item is modified.
Unfortunately, such invalidation messages are automatically generated by conventional cache-coherence protocols, and hence some of these invalidation messages cause unnecessary coherence traffic, which can degrade overall system performance.
In many cases, cache-coherence is not necessary. For example during new object allocation in a Java Virtual Machine (JVM), a newly allocated object is accessible only to the thread that creates it, and thus may be allocated in a memory space which is not globally accessible, such as a thread-local heap (TLH). The allocation of such a new object may cause a significant number of cache misses, and each cache miss will cause unnecessary invalidation messages to be sent over the system bus.
Hence, what is needed is a method and apparatus for performing memory accesses in a shared-memory multiprocessor system without the above-described performance problems.