1. Field of the Invention
This invention relates generally to processor-based systems, and, more particularly, to a partially sectored cache that may be implemented in a processor-based system.
2. Description of the Related Art
Many processing devices utilize caches to reduce the average time required to access information stored in a memory. A cache is a smaller and faster memory that stores copies of instructions and/or data that are expected to be used relatively frequently. For example, central processing units (CPUs), which are but one type of processor, are generally associated with a cache or a hierarchy of cache memory elements. Other processors, such as graphics processing units, can also implement cache systems. Instructions or data that are expected to be used by the CPU are moved from (relatively large and slow) main memory into the cache. When the CPU needs to read or write a location in the main memory, it first checks to see whether a copy of the desired memory location is included in the cache memory. If this location is included in the cache (a cache hit), then the CPU can perform the read or write operation on the copy in the cache memory location. If this location is not included in the cache (a cache miss), then the CPU needs to access the information stored in the main memory and, in some cases, the information can be copied from the main memory and added to the cache. Proper configuration and operation of the cache can reduce the average latency of memory accesses to a value below the main memory latency and close to the cache access latency.
One widely used architecture for a CPU cache memory is a hierarchical cache that divides the cache into two levels known as the L1 cache and the L2 cache. The L1 cache is typically a smaller and faster memory than the L2 cache, which is smaller and faster than the main memory. The CPU first attempts to locate needed memory locations in the L1 cache and then proceeds to look successively in the L2 cache and the main memory when it is unable to find the memory location in the cache. The L1 cache can be further subdivided into separate L1 caches for storing instructions (L1-I) and data (L1-D). The L1-I cache can be placed near entities that require more frequent access to instructions than data, whereas the L1-D can be placed closer to entities that require more frequent access to data than instructions. The L2 cache is typically associated with both the L1-I and L1-D caches and can store copies of instructions or data that are retrieved from the main memory. Frequently used instructions are copied from the L2 cache into the L1-I cache and frequently used data can be copied from the L2 cache into the L1-D cache. Some CPU architectures also implement additional cache levels such as the higher-level L3 cache, which is typically larger and slower than the L2 cache.
A conventional caching architecture uses tags to identify the addresses of information stored in the lines of the cache. In physically tagged caches, the tag represents the upper bits of the physical address of a memory location. For example, when the CPU attempts to access information at a particular physical address, it first checks the tag array to see if the information located at that physical address has been copied into a line or block of the data array of the cache. The CPU determines whether the desired information has been stored in a line of the cache by comparing the cache tags with the tag bits of the desired memory location. If there is a tag match, the CPU can access the information directly from the cache. In a conventional (non-sectored) data array, each cache line is associated with a tag that is stored in a tag array. The tag array occupies a chip area that increases in proportion to the size of the non-sectored cache because of the one-to-one relationship between tags and lines in the data array. The power consumed by the tag array also increases in proportion to the size of the non-sectored cache. The large area and the large power consumption of tag arrays may be detrimental to the design and or performance of larger caches such as L2 and L3 cache arrays.
The size and power consumption of the tag array can be reduced by using sectored caches. In a sectored cache, each tag refers to more than one line (or sub block) in the data array. A CPU can determine whether information at a particular physical address is located in the cache by accessing the tag array to determine whether the information at the particular physical address is stored in any of the multiple lines associated with a tag in the tag array. The one-to-many association of tags to lines can reduce the size and power consumption of the tag array for a given number of cache lines because fewer tags are needed to identify the information stored in the data array. However, fully sectored caches have higher latency because a wider granularity of data must be read from and written to the main memory. For example, data for all the lines identified by a tag is copied each time information in one line associated with the tag is modified. Moreover, the reduction in the power consumption of the tag array must be balanced against the power penalty incurred by always having to fetch all of the sub-blocks associated with a tag in the fully sectored cache even when the CPU only requests a subset of the sub-blocks identified by the tag. One option to reduce the power penalty is to only fetch the sub-block requested by the CPU. But that approach suffers from performance degradation due to unused sub-blocks (holes) within a sector.
The cache tag array can also be decoupled from the data array so that tags are dynamically allocated to data lines. This approach can create a one-to-many mapping between the tag array and the data array using pointers to connect lines that include sequentially information. For example, a first tag can be assigned to a line of the data array when information is copied from the main memory to this line of the data array. If information is accessed sequentially from the main memory and copied to a second line in the data array, then the first tag can also be used to indicate the data in the second line using a pointer from the first line to the second line. Additional pointers can be used to link additional sequentially accessed lines. When the CPU checks the tag array, the physical address can be compared to the first tag to determine if the physical address of the data requested by the CPU is stored in the first or second lines. If there is a cache hit, the first tag and the pointers can be used to access the information in the requested line. Using pointers to link the cache lines associated with a single tag can reduce the size and power consumption of the tag array. However, the area and power savings are mitigated by the additional pointer bits used to connect the data and the tags, as well as the additional logic that is needed to traverse the pointers during line replacement.