A known way to increase the performance of a computer system is to include a local, high-speed memory known as a cache. A cache increases system performance because there is a high probability that once the central processing unit (CPU) accesses a data element at a particular address, its next access will be to an adjacent address. The cache fetches and stores data that is located adjacent to the requested piece of data from a slower, main memory or lower-level cache. In very high performance computer systems, several caches may be placed in a hierarchy. The cache that is closest to the CPU, known as the upper-level or “L1” cache, is the highest-level cache in the hierarchy and is generally the fastest. Other, generally slower caches are then placed in descending order in the hierarchy starting with the “L2” cache, etc., until the lowest level cache that is connected to main memory.
A cache follows certain policies when storing and discarding data. For example, many processors follow an “allocate-on-write” policy that dictates that the cache line corresponding to memory locations that are written by the CPU will be stored in the cache. Typically caches follow a policy known as least-recently-used (LRU) to determine which location to discard to make room for a new data element once all locations have been filled.
Caches typically contain multiple status bits to indicate the status of the cache line to maintain data coherency throughout the system. One common coherency protocol is known as the “MOESI” protocol. According to this protocol each cache line includes status bits to indicate which MOESI state the line is in, including bits that indicate that the cache line has been modified (M), that the cache line is exclusive (E) or shared (S), or that the cache line is invalid (I). The Owned (O) state indicates that the line is modified in one cache, that there may be shared copies in other caches and that the data in memory is stale.
In a typical configuration all caches are combined with the CPU in the same integrated circuit and main memory is located off-chip. Main memory is the slowest and least expensive memory in the system and may be constructed of inexpensive but relatively slow dynamic random access memory (DRAM) chips. This characteristic results in bottlenecks in accessing the off-chip DRAM main memory and it is desirable to avoid these bottlenecks whenever possible. Furthermore in recent years microprocessor speeds have increased faster than DRAM access speeds, compounding the bottleneck problem, the so-called “memory wall”. What is needed then is a method and a data processor which can reduce the main memory access requirements in order to improve system performance. Such a method and data processor are provided by the present invention, whose features and advantages will become more clearly understood from the following detailed description taken in conjunction with the accompanying drawings.