The present invention relates generally to data processing systems, and more particularly, to a data processor having cache memory.
In general, data processing systems comprise a central processing unit (CPU) that executes instructions that are fetched from a main memory. One method to improve the performance of the CPU is to use cache memory. Cache memory is high speed memory that works in conjunction with the CPU and the main memory to provide the necessary data to the CPU. With this architecture, a faster response time is possible than if the CPU fetches all instructions and operands directly from main memory. The improved performance is possible because the cache usually contains the data that the CPU is most likely to request from memory in the next bus cycle. The cache is typically much faster than the main memory; therefore, the cache can usually provide the data required by the CPU much faster than the main memory. Part of the methodology used to load data into the cache is to predict and store the data that is frequently used by the CPU and is likely to be used by the CPU in the next bus cycle.
When the cache contains data requested by the CPU, this is referred to as a cache hit. If the cache does not contain the information requested by the CPU, then this is referred to as a cache miss. On a miss, the data is loaded from the main memory into the cache and is also provided to the CPU. The data is loaded into the cache in anticipation that the CPU will request the data in an upcoming bus cycle. This process continues throughout the operation of the data processing system.
Caches typically consist of a cache tag array and a cache data array. Each array is organized into a number of cache lines. Each cache line consists of a tag portion (contained in the cache tag array) and a data portion (contained in the cache data array). The tag value in a line is compared with the address of a memory request from the CPU to determine if the requested data is present in the data portion of that cache line. Validity information is associated with each cache line to indicate whether the line contains currently valid data. In addition, for caches which can operate in a copyback or writeback mode, additional status information is retained to indicate whether the cache line is modified (dirty) relative to the value stored in main memory.
One cache memory organization technique is known as sector caching. Sector caches reduce the overhead associated with the cache tag array by associating multiple blocks of data with a single tag. Multiple blocks of data are contained within a given cache line, with a validity bit and a dirty bit associated with each block. On a cache miss, a block of data in the cache line is filled.
One problem with prior art sector caches relates to memory designs that support burst fill accesses (or fill sizes) to fill the cache after a miss. Burst accesses are characterized by having an initial access latency of longer duration than successive accesses within the burst. These successive burst accesses are typically sequential or modulo in nature from the initial access, and can be completed in a relatively short time. Many memory designs supporting burst mode accesses contain internal modulo counters that internally increment the initial access address modulo N as the burst access proceeds. This modulo counter wraps around to the initial address once the burst is complete. Burst transfers require significantly fewer cycles to complete than a set of individual memory transfers, and thus improve system performance when utilized in conjunction with a cache memory. Burst lengths of 4 or 8 units are common parameters for both caches and memory devices. Typical caches implement a single line size and fill the lines with a fixed number of transfers, typically 4 or 8 units. However, some memory devices limit the burst support to either 4 or 8 units, and do not support both in a single device. As a result, a cache designed for a given fill length is unable to utilize the high performance burst transfers of the memory devices designed for the different length.
Another problem with prior art caches relates to stalls of the CPU that can occur while line fills are performed. Caches can cause a CPU to stall while line fills are performed when the line fill requires 4 or 8 units of transfer to be performed and the CPU does not request all of these units in sequence. In many cases, the CPU must wait for all of the data to be fetched before continuing with the next access, since the cache is busy with the fill. Thus, to minimize these cache-busy stalls, a small fill size is beneficial.
As a result of these problems, there is a trade-off between associating a large amount of data with a tag (for less tag overhead) and decreasing the cache fill fetch time when defining a cache fill policy. In other words, it is desirable to associate more data with each tag, but that increases the fetch time since more data must be read.