This invention relates generally to memory management, and more particularly to managing a cache in a processing system including multiple processors.
Processing systems often employ caches to store data that is frequently required for calculations or is likely to be accessed in the near future. A cache is generally built from fast memory chips, so that accessing the cache requires less time than accessing a storage device, for example, disks. Storing data in the caches, therefore, speeds up data access and increases system throughput.
When data is read from a storage device, a copy of the data is also saved in the cache, along with the address from which the data is read. The cache monitors addresses of subsequent read operations to see if any of the required addresses is already in the cache. If a required address is in the cache (i.e., a read hit occurs), then the cache immediately returns the data having the required address. Otherwise, the data is fetched from the storage device, and a copy of the data along with its address is saved in the cache.
For increased data access speed, it is generally desirable for a cache to have read hits as frequent as possible. The performance of the cache can be measured by hit ratio, the frequency of read hits relative to all data access to the cache. In addition to the speed and size of cache hardware, hit ratio also depends on data access patterns (i.e., the sequence of addresses being read and written). Cache designs often depend on two properties of the access patterns: temporal locality and spatial locality. Temporal locality means that if a data item is accessed once, it is likely to be accessed again soon; while spatial locality means that if one address is accessed, then nearby addresses are also likely to be accessed. With temporal locality, when the processor writes data to a storage device, the data should also be written to the cache to speed up subsequent access. To exploit spatial locality, caches often operate on several words, i.e., a line of data, at a time. A line of data in the cache is called a xe2x80x9ccache linexe2x80x9d or a xe2x80x9ccache slotxe2x80x9d.
When the cache is full and must remove a cache slot to accommodate new data, the cache selects a cache slot to be replaced and writes it back to the storage device. The new data is then written to the cache, stored in the location where the replaced cache slot was originally stored. The decision as to which cache slot to select depends on how the cache is managed.
Generally, the selected cache slot is one that has not been referenced recently. A conventional approach for cache slot selection requires maintaining a linked list, in which each slot is linked to another cache slot by forward and backward pointers. In a system including multiple processors, accessing the link list requires a lock mechanism to prevent simultaneous access of a cache slot, thereby decreasing throughput and creating significant performance bottlenecks.
The invention relates to managing a memory unit, e.g., a cache, accessible to a plurality of processors. The method selects one of a plurality of slots within a memory unit for removal. In a general aspect of the invention, the method includes: maintaining an age table containing a plurality of entries, each entry having an age value and corresponding to a slot in the memory unit; increasing the age value of the entry each time the entry is examined by one of the processors; storing, independently at each of the processors, a maturity age associated with the processor; and comparing the maturity age compared to the age value of each entry.
In another aspect of the invention, a system includes: processors; a memory unit including slots, the memory unit being accessible to the processors; an age table containing a plurality of entries, each entry having an age value and corresponding to a slot in the memory unit, the age value of each entry being increased each time the entry is examined by any one of the processors; and a maturity age independently stored at each of the processors, the maturity age being compared to the age value of each entry.
Embodiments of the above aspects of the invention may include one or more of the following features.
Each processor adjusts the maturity age dynamically according to an estimate of the number of slots that are candidates for removal. The candidates for removal include a slot that corresponds to an entry having an age value above the maturity age. According to the estimate, the processor calculates a percentage of the candidates relative to a total number of slots within the memory unit; and compares the percentage to a target percentage. The estimate is derived from an average number of slots that have been examined by the processor before the processor determines that one of the slots is removable. The processor also resets the age of an entry to zero when the processor access the slot corresponding to the entry.
If a slot is a candidate for removal, the processor removes the slots from the memory unit. The slot to be removed can be an oldest slot among the slots examined by the processor, when a pre-determined time threshold expires. The slot to be removed can also be an oldest slot among the slots examined by the processor, when the maturity age at the processor is reduced. One of the processors assigns an age value to a nonstandard slot when converting the nonstandard slot to a standard slot, the assigned age value being based on a category of the nonstandard slot. The processors can simultaneously access the age table. The memory unit can be a cache.
One approach for accelerating the selection of a slot for removal requires each of the processors running a low-priority demon to detect a prospective removable slot before the processor needs the removable slot. Each processor can run a low priority verifying demon to detect and correct errors in the age table.
The invention advantageously provides an efficient approach for managing a memory unit with an age table that allows simultaneous access, a procedure for determining the age values of standard and nonstandard slots, and a process for adjusting the number of removable slots according to statistics collected by each of the processors.
The details of one or more embodiments of the invention are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the invention will be apparent from the description and drawings, and from the claims.