1. Field of the Invention
The present invention relates to cache memory systems and methods for managing cache memory systems, and more particularly, to improving performance of a cache memory system by reducing cache misses.
2. Description of the Related Art
In a typical computer system, the memory hierarchy as illustrated in FIG. 1 includes a register set 10 in a processor, a cache system 11, a main memory 12, a disk drive 13, and a magnetic tape driver 14 used as a back-up device. In the respective devices of the memory hierarchy, upper layers such as register set 10 have higher operating speeds but store less information than lower layers such as disk drive 13 or tape drive 14. Most high-performance computer systems improve performance by appropriately setting the sizes, structures, and operation methods of register set 10 and cache system 11.
The hierarchical memory system using cache system 11 can obtain excellent performance by basing the operation of cache system 11 on the property of locality. Programs accessing memory typically demonstrate two types of locality, spatial locality and temporal locality. Spatial locality refers to the tendency of accesses of adjacent memory locations to be close together in time. Temporal locality refers to the high probability that a program will access recently accessed items again in the near future. Caches exploit temporal locality by retaining recently referenced data and exploit spatial locality by fetching multiple neighboring words as a cache line or block whenever a cache miss occurs.
The two types of localities serve as important elements in determining the size of a cache block when designing a cache. Large cache blocks take advantage of spatial locality, while many smaller cache blocks better exploit temporal locality. However, the approaches for exploiting spatial or temporal locality contradict each other. In particular, increasing the block size improves the chance that adjacent data adjacent the most recent access will be in the cache but in a fixed-sized cache, also decreases of the number of cache blocks and the number of recently access data items in the cache.
FIG. 2 conceptually illustrates spatial locality of a cache. In FIG. 2, X and Y axes respectively denote memory addresses and the access probability following an access of a memory address A. The probability function of FIG. 2 is not accurate but does illustrate spatial locality where the probability of accessing an address decreases as the distance from the last address accessed.
FIG. 2 also illustrates aspects of different choices of cache block sizes. If a large block 22 is fetched for a cache miss, miss-to-hit ratio may decrease because of the spatial locality of accesses. However, the mean expected utilization of elements in a cache block is low for a large block because the probability of accessing addresses drops with distance from memory address A. If a small block 20 is fetched for a cache miss, the mean expected utilization of elements in the cache block 20 is greater because the addresses of elements in the cache block 20 are all close to the last accessed address A. Also if software tends to access several spatially separated memory locations on a regular basis, cache block 20 being smaller allows representation of more data locations in a fixed-sized cache system and thereby reduces cache misses for programs having a greater tendency for temporal locality.
For any specific cache size, selection of a cache block size involves a trade-off between the exploitation of spatial and temporal locality. Considering this effect, studies have investigated the optimal size of a cache block for a given cache capacity. Caches constructed with the optimal block size perform well, but this performance is highly dependent on the memory access patterns of executed programs. Some programs perform best with cache blocks of a specific block size, while other programs suffer severe performance degradation when a cache uses the same block size. To solve this problem, a dual data cache can include the spatial cache and the temporal cache which have different cache block sizes to respectively exploit the spatial locality and the temporal locality of memory accesses. One dual cache operating method classifies different accesses as having primarily spatial locality or primarily temporal locality and fetches blocks of information into the cache (spatial or temporal) which exploits that type of locality.
FIG. 3 shows the structure of a dual cache memory. This dual cache memory system is according to a study described in xe2x80x9cInternational Conference On Supercomputing ICS ""95,xe2x80x9d pages 338-347. The cache of FIG. 3 includes a memory device 34 that stores information for a central processing unit (CPU) 33, a spatial cache 30, a temporal cache 31, a prediction table 32 for determining whether to store the information from the memory device 34 in spatial cache 30 or temporal cache 31, a multiplexer 35 for selecting spatial cache 30 or temporal cache 31 when CPU 33 accesses the information, and a demultiplexer 36 that is under control of prediction table 32 and directs information from memory device 34 to spatial cache 30 or temporal cache 31. In the study of the above structure, prediction table 32 determines which cache 30 or 31 receives the information fetched from memory device 34. Accordingly, the performance is improved when prediction table 32 selects the appropriate cache 30 or 31.
The entries in prediction table 32 select cache 30 or 31 based on factors such as an instruction address, a last accessed address, a stride, a length, a current state, and a predicted value. Prediction table 32 obtains the stride using a difference between addresses of data accessed by the same instruction, and the stride indicates or selects the cache 30 or 31 for storage of information fetched from memory device 34 for the instruction. For example, assuming that one execution of an instruction at an address A references or accesses data at an address B, and the next execution of the instruction at address A accesses data at an address B+xcex1. When a following execution of the instruction at address A accesses information at an address B+2xcex1, prediction table 32 determines that the instruction at address A has uniform stride a. The information fetched for the instruction at address A is stored in either spatial cache 30 or temporal cache 31 according to the value of stride a. Accordingly, an entry in prediction table 32 corresponds to an instruction and indicates the instruction""s address and selects a cache 30 or 31 when the instruction requires data from memory device 34. In particular, searching prediction table 32 by instruction address locates the entry that indicates either cache 30 or 31. The address of the information that the instruction accessed can be stored in the another address field of the entry. A difference between the address of currently accessed data and the address of previously accessed data by the instruction is stored in a stride field of the entry in prediction table 32. For example, when a uniform stride separates three accessed addresses such as B, B+xcex1, and B+2xcex1, it is possible to predict whether spatial cache 30 or temporal cache 31 is more efficient for future accesses by the instruction. Thus, storing the information in spatial cache 30 or temporal cache 31 according to the type
However, the cache memory system of FIG. 3 has many accesses that do not have the uniform stride, and the stride of the addresses accessed by the same instruction may change. Accordingly, improving the performance of the cache system of FIG. 3 is difficult when instructions do not have uniform strides for data accesses.
To solve the above problem, a method for managing a cache memory selectively determines the amount of information fetched during a cache miss according to the state of information in a cache and selectively stores fetched information so that information having a high probability of being accessed stays in the cache longer than information having a low probability of being accessed. Thus, the frequency of cache misses is reduced and the efficiency of memory traffic is improved.
In accordance with one embodiment of the invention, a cache memory includes: lower level memory device that stores information for a central controller; a first auxiliary storage device that stores first information blocks; and a second auxiliary storage device that stores second information blocks. Operating the cache system includes selectively fetching a first information block or a second information block as a fetched block from the lower level memory device and selectively storing the fetched information block in the first auxiliary storage device or the second auxiliary storage device. Selection of whether the fetched block is a first or second information block is according to whether a first information block that does not include the target data being accessed but is included in the second information block that includes the target data is in the first auxiliary storage device. The first and second information blocks are respectively of a first or second size, which are different from each other and selected to respectively take advantage of temporal locality and spatial locality in memory accesses. Accordingly, selectively fetching and storing information blocks in the first auxiliary storage device and/or the second auxiliary storage device allows the first auxiliary storage device to perform well with programs exhibiting temporal locality and allows the second auxiliary storage device to perform well with programs exhibiting spatial locality. Overall performance is thus improved.
In the above, the cache system further includes a state storage device having state information showing the numbers of first information blocks included in specific second information blocks and stored in the first auxiliary storage device. Using the state information, the above selectively fetching and storing involves: determining whether the target data being accessed by the central controller is in the first auxiliary storage device or the second auxiliary storage device; accessing the target data from the first auxiliary storage device when the data is in the first auxiliary storage device; copying into the first auxiliary storage device, a first information block that includes the target data and is from the second information block including the target data being accessed and when the target data is not in the first auxiliary storage device but is in the second auxiliary storage device; determining whether a first information block which does not include the target data but is included in the second information block that includes the target data is in the first auxiliary storage device; fetching the first information block including the target data from the lower level memory device and storing that first information block in the first auxiliary storage device when the first information that does not include the target data and is included in the second information block including the target data item is in the first auxiliary storage device; and fetching the second information block including the target data from the lower level memory device, storing that second information block in the second auxiliary storage device, and copying the first information block including the target data from the second information block in the second auxiliary storage device to the first auxiliary storage device when the first information block that does not include the target data and is included in the second information block including the target data is not in the first auxiliary storage. The state information is updated after the first information block containing the target data is stored in the first auxiliary storage device. The central controller can access the target data from the second auxiliary storage device or the first auxiliary storage device after the block containing the target date is copied to the first auxiliary storage device from the second auxiliary storage device.
Alternatively, selectively fetching and storing includes: determining whether the target data to be referenced by the central controller is in the first auxiliary storage device or the second auxiliary storage device, referencing the target data from the first auxiliary storage device when the target data is in the first auxiliary storage device, copying a first information block including the target data to the first auxiliary storage device, from the second information block in the second auxiliary storage device when the target data is not in the first auxiliary storage device but is in the second auxiliary storage device, determining whether the second information block including the target data is in a lower level memory device when the target data is in neither the first auxiliary storage device nor the second auxiliary storage device, fetching the first information block including the target data from the lower level memory device and storing the first information in the first auxiliary storage device when the second information block including the target data is in the lower level memory device, and fetching the second information block including the target data from another lower level memory device, storing the second information block in the second auxiliary storage device, and copying the first information block including the target data from the second information block stored in the second auxiliary storage device to the first auxiliary storage device when the second information block including the target data is not in the first lower level memory device.
Alternatively, selectively fetching and storing includes: providing a state storage device having state information showing the numbers of first information blocks included in specific second information blocks and stored in the first auxiliary storage device, determining whether the target data referenced by the central controller is in the first auxiliary storage device or the second auxiliary storage device; referencing the target data from the first auxiliary storage device when the target data is in the first auxiliary storage device; copying the first information block including the target data from a second information block stored in the second auxiliary storage device to the first auxiliary storage device when the target data is not in the first auxiliary storage device but is in the second auxiliary storage device; determining how many first information blocks that do not include the target data and are in the second information block including the target data are in the first auxiliary storage device when the target data is in neither the first auxiliary storage device nor the second auxiliary storage device; fetching the first information block including the target data from the lower level memory device and storing that first information block in the first auxiliary storage device when no less than a specific upper limit of first information do not include the target data, are in the second information block including the target data, and are in the first auxiliary storage device; and fetching the second information block including the target data item from the lower level memory device, storing that second information block in the second auxiliary storage device, and copying the first information block including the target data from the second auxiliary storage device to the first auxiliary storage device when less than the specific upper limit of the first information blocks do not include the target data are included in the second information block including the target data, and are in the first auxiliary storage device.
In accordance with another embodiment of the invention, a cache memory system for storing some information referenced by a central controller of a computer system from information stored in a lower level memory device, includes a second auxiliary storage device that stores second information blocks that are fetched from the lower level memory device, a first auxiliary storage device in that stores first information blocks fetched from the second auxiliary storage device or the lower level memory device, and a control unit that selectively fetches the first information block or the second information block that contains target data from the lower level memory device and selectively stores the first or second information block in the first auxiliary storage device and/or the second auxiliary storage device. The control unit operates according to whether the target data is in the first auxiliary storage device or the second auxiliary storage device and whether a first information block which does not include the target data is included in the second information block including the target data and is in the first auxiliary storage device.
The control unit includes a state storage device for state information showing the numbers of first information blocks included in specific second information blocks and stored in the first auxiliary storage device and a demultiplexer for routing the fetched block to the first auxiliary storage device or the second auxiliary storage device for storage. The control unit controls the demultiplexer according to the state information of the state storage device.
In accordance with another embodiment of the invention, a method for managing a cache memory includes: providing information in a lower level memory device for a central controller; providing a first auxiliary storage device that stores first information blocks and a second auxiliary storage device that stores second information blocks; determining whether target data to be referenced by the central controller is in the first auxiliary storage device or the second auxiliary storage device, referencing the target data from the first auxiliary storage device when the target data is in the first auxiliary storage device; copying to the first auxiliary storage device a first information block including the target data from a second information block that includes the target data and is stored in the second auxiliary storage device when the target data is not in the first auxiliary storage device but is in the second auxiliary storage device; fetching the second information block including the target data item, storing that second information block in the second auxiliary storage device, and copying the first information block including the target data from the second information block in the second auxiliary storage device to the first auxiliary storage device when the target data is in neither the first auxiliary storage device nor the second auxiliary storage device; and when a first information block stored in the first auxiliary storage device is replaced, updating a first information block included in a second information block stored in a second auxiliary storage device only when the second information block including replaced first information is stored in the second auxiliary storage device, the replaced first information block is modified in the first auxiliary storage device, and the replaced first information block has a value different from that of first information block included in the second information block stored in the second auxiliary storage device.