Processing capabilities of computers have been increasing dramatically over the last ten years. CPU's available in both personal computer and work station class computers commonly operate at 300 megahertz (MHz) and higher and are capable of executing 100 million instructions per second (MIPS). However, the realization of the full potential of these processors has been limited by the memory subsystem inside computers. The memory subsystem includes cache memories on the CPU chip known as level 1 cache and external CPU chip cache memories known as level 2 and level 3 cache. Random access memory (RAM) and primary storage (hard disk) round out a computer memory subsystem. The memory is unable to supply data and instructions to CPUs at a rate at which the CPU could consume data and instructions. CPUs are rarely busy more than 33% of the time and spend the vast majority of time idle waiting for memory to supply data and instructions. RAM has an access time of approximately 60 nanoseconds (ns). A modern reduced instruction set computer (RISC) CPU running at 250 MHz can execute up to four instructions involving many bytes of data in 4 ns, or 15 times the rate at which RAM can supply data and instructions. Without any other components, this CPU would typically be idle 56 out of every 60 ns, or 93.3% of the time. A number of techniques have been implemented to span the speed gap between RAM and the CPU in order to keep the CPU supplied with data and instructions at a higher rate than just RAM alone can provide. Cache memory is the main technique employed to bridge the speed gap. Cache memory relies on the principle of locality of reference in order to anticipate the data and instructions required by the CPU in the near future. The data and instructions required by the CPU in executing application programs tend to be located in adjacent memory locations. As the CPU executes instructions, consumes and generates data, the instructions and data tend to be read or written into adjacent memory locations. The next required memory access tends to be very near the last memory location accessed. This is the principle of locality. As a result, cache memory is used to fetch and hold not only the immediately required data and instructions, but also some amount of data and instructions near the locations required by the CPU at a given time. While the CPU is busy executing current instructions working on current data, cache memory is downloading instructions and data from RAM memory near those locations currently used by the CPU in anticipation of near term CPU data and instruction needs. Fetching data and instructions from RAM is overlapped with CPU execution of current instructions and data allowing the CPU to continue executing instead of waiting for slow RAM accesses to complete. Since cache memory can also only access RAM at the RAM speed of approximately 60 ns, in order to keep the memory pipeline near full with required data and instructions, several levels of cache memory are used. A level 2 cache is relatively large and loads large amounts of instructions from RAM into its memory. A level 1 cache is relatively small and loads smaller amounts of data and instructions into its memory from the level 2 cache. Each level of cache memory gets progressively smaller and faster in access time the further down in the memory pipeline from RAM. Level 2 caches are approximately 1 megabyte (MB) in size and have access times approximately two to three times faster than RAM, typically in the 20 ns range. A level 1 cache is relatively small since it must be located on the CPU chip, approximately 64 kilobytes (KB) in size, and has an access time typically equal to the CPU clock rate, in the range of 4 ns. If present, a level 3 cache would sit between level 2 and RAM, would hold 8 MB or more, and would have an access time near that of RAM, approximately 60 ns. The net result of this elaborate memory subsystem is to improve the CPU utilization from 6% without cache memories to approximately 33%.
Present cache memory subsystems do not solve three main problems associated with cache memories:
1. Maintaining or improving CPU utilization rates as CPU speeds increase; PA1 2. Providing larger caches while maintaining cache access times of one CPU clock cycle, and PA1 3. Providing high CPU utilization rates for those processing applications where locality of memory references is poor.
As CPU speeds continue to increase and memory speeds stay relatively constant as they have done for the last 10 years, the rate of CPU utilization continues to drop as the CPU spends more and more time waiting for cache memory to be filled with the required data and instructions. If the CPU utilization decreases with increasing CPU clock speed, the CPU performance advancements are negated. As CPU speed increases, in order to keep the CPU supplied with the required data and instructions to process, cache memory subsystem of a computer must supply data and instructions at a faster rate. There are only two ways to increase the rate of cache memory transfer, speed up the cache memory access times or increase the size of the cache memory. These two options are at odds with one another. Increasing the cache memory size, though feasible with reduced chip feature sizes, increases the access times at the square of the size. For those cases where locality of reference is not good, e.g., applications performing network data processing, the rate of CPU utilization drops significantly, below the 10% mark. A simple scaling of the present cache memory architecture is not a viable approach to improve or maintain present performance levels in an environment of faster and faster CPU speeds. Increasing the size of level 2 or higher level cache memories (or even increasing RAM), provides little or no performance improvement. In-line or backside level 2 caches have been implemented that improve performance substantially for the cases where good locality of reference exists. This approach uses a separate memory bus between the level 2 cache and the CPU that can operate level 2 caches at the speed of the CPU clock. With this approach, level 1 cache performance is the limiting factor and the limitations on level 1 cache halt further performance improvement. Increasing level 1 cache size to 1 MB or more would yield substantial performance improvement only in the cases where locality of reference exists. Increasing level 1 cache size is limited by two factors: a) the size of a CPU chip is limited in terms of the number of components that can be placed in an area due to heat dissipation and physical limitations and b) as memory size is increased, access time is increased exponentially. In all current systems, nothing has been done to provide good performance when CPU speed increases nor for cases where poor locality of reference exists.
What is needed is a cache memory architecture and design that 1) will provide at least the current level of memory subsystem performance at increased CPU speeds, 2) will provide larger level 1 caches while maintaining cache access times of one CPU clock cycle and 3) will provide a substantial performance improvement for executing an application or mix of applications that exhibit poor locality of reference. This invention provides all three needs in a simple and straight forward fashion through the concept and design of "cache windowing".