Referring to FIG. 1, a typical computer system includes a microprocessor (10) having, among other things, a CPU (12), a memory controller (14), and an on-board cache memory (16). The microprocessor (10) is connected to external cache memory (17) and main memory (18) that holds data and program instructions to be executed by the microprocessor (10). Internally, the execution of program instructions is carried out by the CPU (12). Data needed by the CPU (12) to carry out an instruction are fetched by the memory controller (14). Upon command from the CPU (12), the memory controller (14) searches for the data first in the cache memory (16), next in the external cache (17), and finally in the main memory (18). Finding the data in the cache memory is referred to as a “hit.” Not finding the data in the cache memory is referred to as a “miss.”
The hit rate depends, in no small part, on the caching scheme or policy employed by the computer system, e.g., direct-mapped, or set associative. Generally, a set associative caching policy provides a higher hit rate than a direct-mapped policy. However, for some computer applications, a direct-mapped policy may provide better system performance due to a better hit rate. This depends on the address sequences used by the application, the allocation of memory pages to an application by the operating system, and whether virtual or physical addresses are used for addressing the cache.
An example of a direct-mapped cache memory is functionally depicted in FIG. 2A. In this example, a portion of the main memory (18) is stored or cached in a cache memory (20) having a tag part (22) and a data part (24). The tag part (22) and the data part (24) may be a single cache memory logically partitioned into two parts, or two actual, physical cache memories. In general, the tag part (22) stores the physical addresses of the locations in main memory being cached, and the data part (24) stores the data residing in those locations. Both the tag part (22) and the data part (24) share a common index that is used to reference the two parts.
In operation, the CPU requests data by issuing to the load/store unit an address which includes an index component and a tag component. The load/store unit then goes to the tag part (22) of the cache (20) and checks the specified index to see if that particular tag entry matches the specified tag. If yes, a hit has occurred, and the data corresponding to the specified index is retrieved and provided to the CPU. If no, then the requested data has to be obtained from main memory. For example, an address having an index component of ‘0’ and a tag component of ‘32’ will result in a hit, and data ‘A’ will be retrieved and sent to the CPU. However, there can only be one tag entry per index number and, therefore, a subsequent index component of ‘0’ and a tag component of ‘24’ will result in a miss. A set associative policy generally has a higher hit rate per access, as will be explained below.
An example of a set associative cache is functionally depicted in FIG. 2B. As in the previous example, a cache memory (26) is partitioned into a tag part (28) and a data part (30), with both parts sharing a common index. However, instead of a single entry per index, the tag part (28) and the data part (30) each have four entries, best shown here as rows and columns. A row of entries is called a “set” so that there are as many sets as there are index numbers, and a column of entries is called a “way” so that there are four ways for each index number. This particular cache policy, therefore, is commonly referred to as 4-way set associative. Those skilled in the art will appreciate that the set associative policy is commonly, but not limited to, 2-way to 8-way. Herein, examples are presented for 4-way set associativity, but the concepts are equally applicable to n-way set associativity.
In operation, when the load/store unit goes to search the tag part (28) at the specified index number, all four ways are compared to the specified tag component. If one of the four ways matches (a hit occurs), the corresponding way of the corresponding set in the data part (30) is sent to the CPU. Thus, in the previous example, a virtual address having an index component of ‘0’ and tag component of ‘24’ will be a hit because there are four tag entries per index number. If the first tag entry does not match, there are three more chances to find a match per access. Thus, effectively, the 4-way set associative policy allows the CPU to find cached data one of four ways.
More than one CPU may share cache memory. In such situations, each CPU acts as described above and simply searches in the same memory space as the other CPUs. Depending on the programs being executed on the CPUs, different configurations of associativity, cache size, and resource sharing results in differing degrees of performance.
Referring to FIG. 3, a typical CPU (50) is shown having a functional unit (52) and three levels of cache, L1 cache (54), L2 cache (56), and L3 cache (58). When a program (60) is executed on the CPU, an output (62) is generated. In order to determine the optimal configuration of associativity, cache size, and resource sharing, testing is performed on the program (60) to determine the specific workload required. Once specific workload requirements are determined through testing, a configuration of components for optimal performance can be found and a system having the appropriate characteristics can be manufactured.