1. Field of the Invention
This invention relates to the field of multiprocessor computer systems and, more particularly, to mechanisms and methods for prefetching data in multiprocessor computer systems.
2. Description of the Related Art
Cache-based computer architectures are typically associated with various features to support efficient utilization of the cache memory. A cache memory is a high-speed memory unit interposed in a memory hierarchy between a slower system memory and the microprocessor to improve effective memory transfer rates and, accordingly, improve system performance. The name refers to the fact that the small memory unit is essentially hidden and appears transparent to the user, who is aware only of a larger system memory.
An important consideration in the design of a cache memory subsystem is the choice of key design parameters, such as cache line size, degree of subblocking, cache associativity, prefetch strategy, etc. The problem in finding an “optimum setting” for these design parameters is that while improving one property, some others may be degraded. For example, an excessively small cache line may result in a relatively high number of capacity misses and in relatively high address traffic. A slightly longer cache line often decreases the cache miss rate and address traffic, while the data bandwidth increases. Enlarging the cache lines even more can result in increased data traffic as well as increased address traffic, since misses caused by false sharing may start to dominate. A further complication is that application behavior can differ greatly. A setting which works well for one application may work poorly for another.
It is also well known that large cache lines are often beneficial for data that cause capacity misses due to spatial locality. Data that are involved in communication sometimes take advantage of large cache lines (true sharing). However, the risk of false sharing misses increases with large cache lines.
Prefetching in multiprocessors has been studied by several researchers as a method of reducing the miss penalty. Numerous prefetching schemes have been proposed, both software-based and hardware-based.
The hardware approaches to prefetching in multiprocessors usually employ either stride prefetching or sequential prefetching. While sequential prefetching prefetches the immediately following addresses on a cache miss, stride prefetching prefetches addresses that are a certain distance away from the previous cache miss. Stride prefetching has a certain learning time under which the prefetcher computes which address to prefetch next. The efficiency of sequential and stride prefetching depends on the access pattern behavior.
In both systems that employ sequential prefetching and systems that employ stride prefetching, the address and data traffic may increase since for each prefetch a new message is sent on the network. In some instances the prefetch may be performed unnecessarily. Bus-based multiprocessors are especially sensitive to a heavy increase in address traffic since the available snoop bandwidth is limited. Thus, although various prefetch strategies have been successful in reducing the miss penalty in multiprocessing systems, it would be desirable to increase the efficiency of prefetching even further by improving prefetch accuracy. It would be particularly desirable to avoid cache misses introduced by communicating cache lines and associated false sharing.