1. Field of the Invention
The present invention generally relates to multiple processor shared memory computer systems and more specifically to managing cache memory in such systems.
2. Description of the Related Art
Users of data processing systems continue to demand greater performance for handling increasingly complex and difficult tasks. Greater performance from the processors that operate such systems may be obtained through faster clock speeds so the individual instructions are processed more quickly. However, processing speed has increased much more quickly than the speed of main memory. Despite the speed of a processor, a bottleneck on computer performance is that of transferring information between the processor and memory. Therefore, cache memories, or caches, are often used in many data processing systems to increase performance in a relatively cost-effective manner.
A cache is typically a relatively faster memory that is intermediately coupled between one or more processors and a bank of slower main memory. Cache speeds processing by maintaining a copy of repetitively used information in its faster memory. Whenever an access request is received for information not stored in cache, the cache typically retrieves the information from main memory and forwards the information to the processor. If the cache is full, typically the least recently used information is discarded or returned to main memory to make room for more recently accessed information.
The benefits of a cache are realized whenever the number of requests to address locations of cached information (known as “cache hits”) are maximized relative to the number of requests to memory locations containing non-cached information (known as “cache misses”). Despite the added overhead that occurs as a result of a cache miss, as long as the percentage of cache hits is high (known as the “hit rate”), the overall processing speed of the system is increased.
Illustratively, one method of increasing the hit rate for a cache is to increase the size of the cache. However, cache memory is relatively expensive and is limited by design constraints, particularly if the cache is integrated with a processor on the same physical integrated circuit.
As an illustration, one cost-effective alternative is to chain together multiple caches of varying speeds. A smaller but faster primary cache is chained to a relatively larger but slower secondary cache. Furthermore, instructions and data may be separated into separate data and instruction caches. Illustratively, some processors implement a small internal level one (L1) instruction cache with an additional external level two (L2) cache, and so on.
Shared-memory multiprocessor systems present special issues regarding cache implementations and management. In shared-memory multiprocessor systems, all processors can access all memory including main and cache memory. This enables the tasks on all of the processors to efficiently and easily share data with one another. However, this sharing must be controlled to have predictable results. Conventionally, shared memory multiprocessor systems have hardware that maintains cache coherence and provides software instructions that can be used to control which processor is writing to a particular memory location. In order to prevent multiple processors from storing to the same memory location (or cache line) at the same time, most shared memory multiprocessors use a snoop-invalidate cache protocol to allow a processor to write data to a memory location (or cache line) only if it has an exclusive copy of the cache line containing the memory location.
In a system with a large number of processors, the next processor to read and/or write to a memory location is often not the processor that has the cache line stored in the cache associated with that processor. This requires the cache line to be moved between the caches of different processors. Efficiently moving cache lines to other caches is critical to multiprocessor systems.
On a shared-memory multiple processor system with 16 megabytes of level two (L2) cache per processor, about forty percent of the cache misses are due to reading and/or writing of shared data. Making the cache larger or adding additional levels of cache does not reduce the amount of cache misses. Instead, the result is the percentage of cache misses become larger with a larger cache and movement of the cache lines between caches reduces the performance of multiple processor systems.
Therefore, there is a need for a mechanism that will reduce the amount of cache misses in shared-memory multiple processor systems and improve overall system performance.