1. Technical Field
The present invention relates in general to an improved method and system for accessing cache memory within a data processing system, and in particular, to an improved method and system for high speed access to a banked cache memory. Still more particularly, the present invention relates to an improved method and system for high speed access to a high latency remote banked cache memory wherein the access time to the cache memory is minimized.
2. Description of the Related Art
A digital computer system utilizes a central processing unit and a computer main memory to perform a wide range of functions. Such circuitry permits repetitive functions to be carried out at rates much higher than if the corresponding functions could be performed manually. Memory locations provide storage from which data can be read or to which data can be written.
As technology develops, multiple processors and multiple levels of memory and cache have been added to digital computer systems. In addition, the utilization of on-chip and off-chip cache continues to increase processing capabilities. In general, requests come from a system bus to a processor system that includes a processor core and multiple levels of on-chip and off-chip cache. The processor system then performs a snoop to detect whether the requested data is available in the on-chip cache. In addition, the processor system typically snoops an off-chip cache associated therewith. From the snoop return, the request may be directed to a particular cache in order to access the requested data.
Several types of memory have been developed which may be utilized as on-chip cache, off-chip cache and/or main memory. These random access memories (RAM) are preferably semi-conductor based memory that can be read from and written to by the central processing unit and other hardware devices. The storage locations within RAM can be accessed in any order. For example, one type of RAM which is well known in the art is a dynamic RAM (DRAM). Dynamic RAM is typically utilized for storing large increments of data. In particular, DRAMs store information in integrated circuits containing capacitors. Because capacitors lose their charge over time, DRAM circuits typically include logic to refresh the DRAM chips continuously. While a DRAM chip is being refreshed, the chip cannot be read by the processor, which leads to wait states while the DRAM chips is being refreshed. Another type of RAM which is well known in the art is static RAM (SRAM). SRAMs store information in logic circuits known as flip-flops, which retain information as long as there is enough power to run the device. SRAMs do not have the delay states inherent in DRAMs, however SRAM circuitry is more complex than DRAM circuitry and is typically utilized in smaller increments.
In general, memory devices such as SRAM and DRAM are formed in memory locations which form memory arrays. The memory locations of the memory arrays are identified by memory addresses. When memory locations of a memory array are to be accessed, the addresses of the memory locations are provided to decoder circuitry of the memory device, as is well known in the art. The decoder circuitry decodes the address signals applied thereto to permit access to the memory locations identified by the address signals. Typically, multiple banks of SRAM or DRAM may be placed together whereby a controller controls access to each bank of memory and routes addresses to the proper bank of memory within the banked cache memory.
In a recent configuration of processor/memory devices, a processor accesses an on-chip level-one (L1) cache which comprises small, fast SRAM, an on-chip level-two (L2) cache which comprises banked SRAM and an off-chip level-three (L3) cache which comprises banked DRAM cache. In addition, the processor may access a main memory which is shared among multiple devices. There is a greater latency inherent in accessing data from off-chip memories than from on-chip memories. However, off-chip memories are typically larger than on-chip memories and thus can provide large amounts of data for a single access. Among the off-chip memories, a processor can access an L3 cache much more quickly than a main memory, particularly when the main memory is shared by multiple processors.
Several methods have been developed to reduce the latency inherent in accessing a remote L3 cache and in accessing banked DRAM cache to determine which bank of memory to access next. According to one method known as bank parking, for each request from a system bus, a speculative access to the bank of memory that was previously accessed is made. By this method, speculative access is only beneficial if the bank of memory that was previously accessed is requested again. According to another method known as redundant access, each bank of memory is a clone of all other banks of memory. Therefore, for each request, the same address is passed to each bank of memory whereby any available banks of memory can respond to the request. In order for redundant access to be beneficial, a small number of banks of memory and small amount of cache in each bank of memory are utilized. However, it is preferable to utilize a large number of banks and a large amount of cache in each bank for an L3 cache.
In view of the foregoing, it is therefore desirable to provide a method of accessing a high latency banked cache memory and in particular accessing off-chip banked DRAM cache from a processor, whereby fast access to the cache is provided.
In view of the foregoing, it is therefore an object of the present invention to provide an improved method and system for accessing cache memory within a data processing system.
It is another object of the present invention to provide an improved method and system for high speed access to a banked cache memory.
It is yet another object of the present invention to provide an improved method and system for high speed access to a high latency remote banked cache memory wherein the access time to the cache is minimized.
In accordance with the method and system of the present invention, during a first cycle, in response to receipt of a request address at an access controller, the request address is speculatively transmitted to a banked cache memory, where the speculative transmission has at least one cycle of latency. Concurrently, the request address is snooped in a directory associated with the banked cache memory. Thereafter, during a second cycle the speculatively transmitted request address is distributed to each of multiple banks of memory within the banked cache memory. In addition, the banked cache memory is provided with a bank indication indicating which bank of memory among the multiple banks of memory contains the request address, in response to a bank hit from snooping the directory. Thereafter, data associated with the request address is output from the banked cache memory, in response to the bank indication, such that access time to a high latency remote banked cache memory is minimized.
All objects, features, and advantages of the present invention will become apparent in the following detailed written description.