The present invention relates to the field of multi-level hierarchy memory systems, and more particularly to reducing resource collisions associated with memory units, e.g., cache memories, in a multi-level hierarchy memory system.
A conventional processing system may employ a multi-level hierarchy memory system. A multi-level memory processing system may comprise a processor having various execution units and registers, as well as branch and dispatch units which forward instructions to appropriate execution units. The processor may further comprise a level one (L1) instruction cache and a level one (L1) data cache. It is noted that those skilled in the art will recognize that a single, unified L1 cache may be implemented. The L1 instruction cache and L1 data cache may be configured to temporarily store instruction and data values, respectively, that may be repeatedly accessed by the processor. By storing instruction and data values that are repeatedly accessed, processing speed may be improved by avoiding the step of loading the values from the system memory, e.g., Random Access Memory (RAM). In order to minimize data access latency, one or more additional levels of cache memory may be implemented within the processing system, such as a level two (L2) cache and a level three (L3) cache. The lower cache levels, e.g., L2, L3, may be employed to stage data to the L1 cache and typically have progressively larger storage capacities but longer access latencies.
Typically, a cache may be organized as a collection of spatially mapped, fixed size storage region pools commonly referred to as xe2x80x9crows.xe2x80x9d Each of these storage region pools typically comprise one or more storage regions of fixed granularity. These storage regions may be freely associated with any equally granular storage region in the system as long as the storage region spacially maps to the row containing the storage region pool. The position of the storage region within the pool may be referred to as the xe2x80x9ccolumn.xe2x80x9d The intersection of each row and column contains a cache line. The size of the storage granule may be referred to as the xe2x80x9ccache line size.xe2x80x9d A unique tag may be derived from an address of a given storage granule to indicate its residency in a given row/column position.
A particular cache line within a cache may be accessed by using the following addressing scheme. A subset of the full number of address bits commonly referred to as a row selector may be used to form an index into the cache, which uniquely identifies a row within the cache. The remaining set of address bits commonly referred to as an address tag may be used to select a particular column within the cache thereby defining a physical address for the cache line. It is noted that a cache may also be represented by one or more banks where each bank may be indexed by bits within the row selector bits commonly referred to as bank selector bits. Typically, bank selector bits are the bottom bits in the row selector bits.
When a request accesses a cache structure, the address referenced by the request is mapped to determined which row in the cache is allowed to maintain the referenced granule. The tag is then derived and compared with the tag associated with each freely associative region (or column position) within the row. If a match is found, the matching column position may include a valid copy of the memory granule and the requested operation may be permitted to be carried out at this cache. This is commonly referred to as a xe2x80x9ccache hit.xe2x80x9d Otherwise, a xe2x80x9ccache missxe2x80x9d is said to have occurred. In the case of a cache miss, one or more requests may typically be spawned to other memory structures, e.g., caches, in the storage hierarchy to carry out the request for information.
The amount of concurrent request traffic through the multi-level hieararchy memory system tends to increase due to the increase in speculative instruction execution, the increase in the usage of prefetching mechanisms, and the tendency of a single operation, e.g., request, at a given level in the hierarchy to spawn multiple operations, e.g., requests, to the next level of the hieararchy such as in a cache miss.
General caching theory postulates that an ensuing temporally correlated request, i.e., a request arriving soon after a previous request, to access a granule of memory from the same requester may be spatially correlated such that it may be likely to map to a memory granule that is spatially proximate to the memory granule accessed by the previous request.
For this reason, caches are typically organized such that spatially proximate memory granules map to different rows to minimize the over utilization of any given row. Likewise, they are typically organized such that spatially proximate memory granules map to different banks to increase the likelihood that concurrent requests do not result in resource collisions.
For example, a processing system may have a contiguous memory of 1 gigabyte. A given memory unit in the multi-level memory hierarchy system may have a cache structure capable of retaining 2 megabytes. The cache structure may be organized into 1024 rows where each row may comprise 16 freely associative storage regions (or lines) where each storage region may comprise a 128-byte granule of memory, i.e., 128-byte line-size. The byte capacity of the cache may thus be calculated by multiplying the number of rows, the number of columns and the line size of the cache which equals 2 megabytes.
As stated above, in the case of a cache miss, one or more requests may typically be spawned to other memory levels in the multi-level memory hierarchy to carry out the requested operation. More specifically, in the case of a cache miss, a fetch request may be spawned to the next level of the cache hierarchy. The purpose of the fetch is to retrieve the memory granule, i.e., requested information, that was not found at the current cache. In most cases the requested memory granule may be installed, i.e., reloaded, into the cache structure at the current cache.
When the memory granule is installed into the current cache structure, it may displace another memory granule (called the xe2x80x9cvictimxe2x80x9d) currently residing in the same row. Once a victim is selected, the status of the victim may typically be assessed. Typically, the victim is selected by a hash which attempts to choose the least recently used granule since it has the lowest temporal correlation. Normally, if the victim has been locally modified, i.e., it is the only valid copy of the memory granule in the system, it must be preserved. The operation used to insure the preservation of the victim may commonly be referred to as a castout. A castout operation spawns a write request to the next level of the cache hierarchy. The purpose of the write request is to safely place the victim memory granule into the cache structure at the next level of the hierarchy.
The above two operations, e.g., fetch and castout operations, are specific examples of operations that may be temporally correlated in that they tend to be issued almost simultaneously. The fetch and castout operations are spatially correlated in that the memory granules they reference map to the same row in the cache from which they are initiated. It is noted that other operations may be temporally correlated such as fetch operations or castout operations or invalidate/kill operations or any combination thereof that occur almost simultaneously.
Referring to the fetch and castout operation example, when the fetch and castout operations arrive at cache (call it cache Y) in the next level of the cache hierarchy their temporal and spatial correlation to each other change the level of collision risk they introduce at cache Y. Since the row selection bits in the address associated with the modified information in the victim in cache X are the same as the row selection bits in the address associated with the requested information in cache Y, the bank selection bits in the address associated with the modified information in cache X must equal the bank selection bits in the address associated with the requested information in cache Y. This introduces the unfortunate property that fetches and associated castouts spawned at cache X and serviced at cache Y have a 100% likelihood of colliding upon the same bank in cache Y commonly referred to as a resource collision. That is, the information requested in the fetch request issued by cache X is stored in the same bank as the bank in cache Y that will store the modified information sent by cache X in the castout request.
A resource collision may also occur when fetch operations or castout operations or invalidate/kill operations or any combination thereof occur almost simultaneously. The probability of resource collisions when there does not exist spatial correlation is 1/N where N is the number of banks in the cache. The probability of resource collisions may be increased based upon the spatial correlation between the operations. Therefore, there exists a general problem of resource collision in hierarchial memories based upon spatial and temporal proximity.
It would therefore be desirable to develop a hash that reduces resource collisions associated with memory units, e.g., caches, in a multi-level heirarchy memory system.
The problems outlined above may at least in part be solved in some embodiments by a hash that combines the number N bits in the bank selector field with the number M bits adjacent to the row selector field in an address to determine which bank may store the information associated with the address.
In one embodiment of the present invention, a method for reducing resource collisions associated with address flows through hierarchial memory units comprises the step of receiving a first operation, e.g., fetch request, by a first memory unit where the first operation is a first request to access information identified by a first address. The first memory unit may be configured to determine a particular entry, e.g., cache line, to store the information requested that may be retrieved by a lower level memory unit if the information requested was not present within the first memory unit. If the information in the particular entry, e.g., cache line, selected by the first memory unit to store the requested information has been modified, then the first memory unit may issue a second operation, e.g., castout request, to a lower memory level, a second memory unit, to store the modified information associated with a second address in a first bank in the second memory unit. The first bank in the second memory unit that may store the modified information to be sent from the first memory unit may be determined by implementing a hash that combines the number N bits in the bank selector field with the number M bits adjacent to a region of correlation, e.g., row selector field, in the second address associated with the modified information. Furthermore, the first memory unit may issue a third operation, e.g., fetch request, to the lower memory level, the second memory unit, almost concurrently with the second operation if the information requested was not present within the first memory unit. The third operation memory unit may request to access the information identified by the first address. The information requested in the third operation may be identified in a second bank in the second memory unit by implementing the same hash that combines the number N bits in the bank selector field with the number M bits adjacent to the region of interest, e.g., row selector field, in the first address associated with the information requested. Upon identifying the requested information, the second memory unit may write back or reload the requested information to the particular entry in the first memory unit selected to store the requested information. If the value of the second number M bits in the first address differs from the value of the second number M bits in the second address, then the first bank and the second bank differ.
In another embodiment of the present invention, the hash may combine the N number bits in the bank selector field with the M number bits adjacent to a region of correlation, e.g., row selector field, using an XOR function. In another embodiment of the present invention, the hash may combine the number N bits in the bank selector field with the number M bits adjacent to a region of correlation, e.g., row selector field, by adding the number N bits in the bank selector field with the number M bits adjacent to the region of interest, e.g., row selector field, and discarding the carry bit.
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention.