A primary factor in the utility of a computer system is its speed in executing application programs. A high-performance computer system is expected to be responsive to user inputs and to accurately provide processed results within real-time constraints. A primary factor in the speed and responsiveness of a computer system is the efficiency of its processor. Accordingly, an enormous amount of investment has been placed into the development of very high-speed processors.
It is important to provide software instructions and data to a processor (e.g., central processing unit, or CPU) at least as fast as the rate at which to CPU processes such instructions and data. Failure to provide the needed instructions/data results and the CPU idling as it waits for instructions. Modern integrated circuit fabrication technology has enabled the production of CPUs that function at extremely high speeds (e.g., 3 gigahertz and above). Consequently, it has become very challenging for system designers to ensure that the needed instructions/data are provided to a modern high-speed CPU from the system memory without imposing substantial CPU idle time penalties.
A widely used solution for reducing CPU idle time penalties involves the incorporation of highly optimized memory caches within the CPU die. In general, a memory cache is used to speed-up data transfer and may be either temporary or permanent. Memory caches are well known and widely used to speed-up instruction execution and data retrieval. These temporary caches serve as staging areas, and are optimized to reduce data access latency in comparison to system memory.
In a typical computer system implementation, a memory cache functions as a low latency storage area that bridges main memory and the CPU. Modern CPUs typically include two specialized memory caches; a level one cache and a level two cache. A level one (L1) cache is a very high-speed memory bank built into the CPU die. The L1 cache is typically located directly within the CPU die (fabricated using the same highly optimized semiconductor fabrication process as the CPU) and is therefore much faster than main memory. A level two cache (L2) is a secondary staging area that feeds the L1 cache. The L2 cache is generally not as fast as the L1 cache. Because the circuitry of the L2 cache is less complex in comparison to the L1 cache, the L2 cache is usually larger. The L2 cache may be built into the CPU chip, may reside on a separate chip in a multichip package module, or may be a separate bank of chips. The objective of both the L1 and the L2 caches is to keep staging more instructions and data in high-speed memory closer to the CPU.
Instructions and data are transferred from main memory to the cache in blocks. These blocks are usually referred to as cache lines, and usually represent the smallest unit of memory that can be transferred between the main memory and the cache. To increase efficiency, when data needs to be transferred from main memory to the cache (e.g., L1 cache or L2 cache), a number of cache lines are transferred at once. Typically, some kind of look-ahead sequence is used to fetch the desired cache line plus a number of additional cache lines. The more sequential the instructions in the routine being executed or the more sequential the data being read, the greater chance the next required item will already be in the cache, thereby resulting in better performance. This technique is referred to as prefetching.
A problem exists however, in that even with the implementation of L1 and L2 caches, on many occasions, high-speed CPUs are still starved of data and are forced to idle while needed data is fetched. Prior art solutions to this problem have involved increasing the size of the caches, increasing the speed of the system memory, increasing the bandwidth of the system memory bus, and the like. These prior art solutions have not proven entirely successful. For example, increasing the size of the caches of the CPU has a very significant impact on the overall cost of the CPU. A larger cache leads to larger CPU die size, and a correspondingly more expensive CPU chip. Increasing the system memory speed also impacts cost in that high-performance memory (e.g., DDR chips, RDRAM chips, etc.) is expensive and can be in short supply. Increasing the bandwidth of the system-memory bus impacts the architecture of the overall computer system in that the support chips which interface the CPU to the other components of the computer system may also need to be redesigned to function properly with higher bus speeds/bus widths.
Another prior art solution, utilized in computer systems implementing a Northbridge/Southbridge chip set architecture, involves placing an additional cache within the Northbridge of the computer system's chip set. The Northbridge typically functions as the memory controller for the CPU, interfacing data reads/writes from the CPU with the system memory. System designers have incorporated a small cache within the Northbridge (e.g., less than 2 kB) in an attempt to alleviate the CPU data starvation problem. This solution has not proven successful since the size of the cache in the Northbridge is typically much smaller than the L1 and L2 caches of the CPU. An additional problem is the fact that the accesses to the system memory by the Northbridge cache and by the CPU are basically uncoordinated, leading to bandwidth contention, duplication, and similar problems.