A data storage system is typically able to service “data write” or “data read” requests issued by a host computer. A host may be connected to the storage system's external controller or interfaces (IF), through various channels that transfer both data and control information (i.e. control signals). Physical non-volatile media in which data may be permanently or semi-permanently stored includes arrays of disk devices, magnetic or optical, which are relatively less expensive than semiconductor based volatile memory (e.g. Random Access Memory) but are relatively much slower in being accessed.
A cache memory is a high-speed buffer located between an IF and the disk device(s), which is meant to reduce the overall latency of Input/Output (I/O) activity between the storage system and a host accessing data on the storage system. Whenever a host requests data stored in a memory system, the request may be served with significantly lower latency if the requested data is already found in cache, since this data must not be brought from the disks. As of the year 2004, speeds of IO transactions involving disk activity are typically on the order of 5-10 milliseconds, whereas IO speeds involving cache (e.g. RAM memory) access are on the order of several nanoseconds.
The relatively high latency associated with disk activity derives from the mechanical nature of the disk devices. In order to retrieve requested data from a disk based device, a disk controller must first cause a disk reading arm to physically move to a track containing the requested data. Once the head of the arm has been placed at the beginning of a track containing the data, the time required to read the accessed data on the relevant track is relatively very short, on the order of several microseconds.
One criteria or parameter which is often used to measure the efficiency of a cache memory system or implementation is a criteria referred to as a hit ratio. A hit ratio of a specific implementation is the percentage of “data read” requests issued by the host that are already found in cache and that consequently did not require time intensive retrieval from disk operations. An ideal cache system would be one reaching a 100% hit ratio.
Intelligent cache algorithms aimed at rising the level of hit ratio to a maximum are often based on two complementary principles: (1) “prefetching”, namely, guessing in advance which portions of data in the disk devices will likely be requested by the host in order to retrieve these data portions into the cache in advance, in anticipation of a future read request from the host; and (2) replacement procedures that establish a “discard priority”. The discard priority relates to data portions which are already stored in the cache, and is responsible for managing the order by which each of the data portions that are currently stored in the cache are to be discarded.
The various approaches to the management of the discard priority queue attempt to anticipate which of the data portions currently stored in the cache are the least likely to be requested by the host, and set the priority queue accordingly. Thus, whenever cache space is required for fetching new data into the cache, the discard priority queue is consulted, and the data portions which have been determined to be the least likely to be requested by the host are discarded first in order to free up cache storage space which may be used for storing the newly fetched data. From another perspective, the various approaches to the management of the discard priority queue are intended to increase the likelihood that those portions of data that have been determined to be the most likely to be requested by the host will not be discarded prematurely in order to free up space or for any other purpose.
An illustration of a replacement queue is shown in FIG. 1A, to which reference is now made. FIG. 1A illustrates a virtual list that is representative of a replacement queue, the list illustrates the relationship between the various slots (slot 1000 (0) through slot 1000(k)) in an exemplary replacement queue. Here, slot 1000 (0) is the “head” of the replacement queue. In other words, slot 1000 (0) is positioned at the top of the list of slots to be purged or replaced, when there is a need for an available slot to store incoming data blocks or segment(s), for example, prefetched data segments arriving from one or more disk devices associated with the cache memory. Slot 1000 (k) is the “tail” of the replacement queue and is currently at the bottom of the list of slots to be purged or replaced.
The various approaches towards managing a discard priority queue are also commonly referred to as “replacement procedures”. Replacement procedures known in the art can be implemented by creating a doubly linked list of data portions found in the cache. This list is referred to herein as a replacement queue. In a common replacement queue, the data portions are added to the replacement list in the order by which they are retrieved from the disk, and some indication or pointer are associated with the portion that was most recently added to the list. When a new data portion is to be added to the cache, the structure of the replacement queue in combination with the pointer dictate the portion in the cache that is to be discarded in order to free up storage space for the new data portion. Some implementations use both a “head” pointer and a “tail” pointer identifying, respectively, the beginning and the end of the replacement queue. In cache implementations in which certain data portions are defined as permanently residing in cache, these two pointers will not necessarily be adjacent to one another in the linked list. Having implemented a replacement queue with either one or two pointers, replacement schemes or procedures known in the art are applied to manage the replacement queue. Examples of some replacement schemes include: FIFO (first-in first-out: replace the block that has been in cache for the longest time), LRU (Least Recently Used: replace the data portion that has not been used for the longest time), LFU (Least Frequently used), MFU (Most Frequently Used), or Random Selection.
The basic approaches to implementing replacement procedures in cache memories may be further enhanced in several ways, by dynamically and differentially modifying the position of a data portion in the replacement queue. Relocating data sequences within a cache is referred to herein as a “promotion” if the relocation is intended to increase the amount of time the data portion remains in the cache. Relocation of data sequences in the cache may improve the efficiency of the basic cache algorithm. Each relocation of a data sequence in the cache involves some cache management operations and may therefore also incur a certain performance cost. It is therefore desirable to balance the benefit associated with the replacement activity with the performance cost associated with each data sequence relocation within the cache.
Thus, by way of example, in case an LRU scheme is implemented, the determining factor implemented by LRU scheme is the period of time that elapsed since a particular data portion was read into the cache or was accessed while in the cache. The basic LRU approach prescribes that whenever a data portion is accessed while being in cache, it is promoted to the tail of the queue since it has turned into the “most recently used” portion.
In a slightly modified approach, if the elapsed time since the data portion has been accessed is less than a predetermined threshold, the data portion will not be relocated and will remain in the same location in the replacement queue, thus saving a number of cache management operations. The predetermined threshold can be established, by way of example, as the average fall through time (FTT) of prior data portions in the memory.
In a second modified LRU approach the number of times a data portion had been accessed while in the memory is compared to a fixed number. If the number of times the data portion had been accessed is above the fixed number, the data portion is placed at the tail of the replacement queue, causing the data portion to remain in the cache for a longer period of time. U.S. Pat. No. 5,592,432 to Vishlitzky, et al., describes such cache management implementations. U.S. Pat. No. 6,715,039 to Michael, et al., describes a method for promoting a data portion within a cache, which takes into consideration the balance between the underlying performance costs and the associated gains. The method disclosed in the '432 Patent seeks to determine whether the probability of losing a cache hit is greater than the ratio between the performance cost associated with the promotion of the portion and the benefit associated with the cache hit.
It should be noted that all of the above modifications to the basic cache memory replacement scheme require periodical calculations of global parameters and statistics in the system, such as the FTT or the probabilities of losing a cache hit as indicated, or, alternatively, the number of times that a data portion is accessed, all of which incur overhead costs.
A second aspect of cache management concerns the prefetch algorithms. Prefetch algorithms attempt to guess or anticipate in advance which portions of data stored on a disk device are anticipated to be requested by a host within a short period of time. The prefetch algorithms include one or more triggers which activate the retrieval of the data sequences from the disk devices into the cache when on or more criteria are met. The criteria implemented in the prefetch algorithms are usually defined in a manner so as to optimize the hit ratio achieved by the use of this particular algorithm, and are aimed at maximizing the number of retrieved sequences which are eventually requested by the host within a relatively short time, thereby enhancing system performance.
Prefetch algorithms known in the art commonly fall into one of two categories. The First category or group includes algorithms which are based on the identification of sequential streams of read requests. If the storage system is able to identify that the host is issuing requests for sequential streams it may assume that this kind of activity will be maintained for a while and, accordingly, calculate which additional portions of data are likely to be requested by the host within a short period of time, and may send these additional portions to the cache in advance. U.S. Pat. No. 5,682,500 to Vishlitzky, et al., describes a prefetch method that follows this approach.
A second group of prefetch algorithms includes algorithms which are based on the identification of “hot zones” in the system. A statistical analysis of recent activity in the system may show that a certain area, defined in advance as a potential “hot zone”, is being intensely addressed by the host, and consequently, a mechanism is triggered to bring into the cache all the data contained in that zone. The underlying assumption is that such data portions tend to be addressed in their totality, or in their majority, whenever they are addressed over a certain threshold of focused activity. U.S. Pat. No. 5,765,213 to Ofer, describes a prefetch method that follows an approach similar to this.
One may notice that these two kinds of approaches to prefetch algorithms require a considerable amount of resources in order to monitor current activity, decide on the prefetch, and then implement the prefetch in a coordinated manner across the system. U.S. patent application Ser. No. 10/914,746 entitled “System Method and Circuit For Retrieving Data into a Cache Memory From a Mass Data Storage Device”, filed on Aug. 9, 2004, describes a third group of prefetch algorithms, which require minimal overhead. The third group of prefetch algorithms includes algorithms which are based on fetching into the cache data portions when one or more relatively simple criteria which may be tested on the data portion in question are met. Thus, when implementing this third group of prefetch algorithms no global considerations or comparison with other data portions in the system are required. By way of example, one such criterion may be that in a certain read request or in a series of read requests, the percentage of blocks that are associated with a specific data portion in the disk exceeds a predefined threshold. Further by way of example, a second criterion may be that in a certain read request or in a series of read requests, the first n blocks in the specific data portion have been requested by the host.
The policies set forth by the prefetch algorithms may affect the replacement procedures and vice-versa. Thus, the implementation of prefetch algorithms may present additional factors which may influence the performance of the replacement procedures and that therefore should be taken into account when implementing a replacement procedure in the system. Indeed, if the prefetch algorithm in question is excessively aggressive, there is a danger of data flushing by prefetched data portions. Even if the prefetch algorithm is very successful and the great majority of prefetched data will eventually be used by the host, the gain in terms of increased hit-ratio associated with the intensive prefetch activity may not justify the “blockage” of relatively large portions of the available cache space which may not enable the caching of other IO activity in the system. This problem is addressed by the definition of a “microcache”, exclusively devoted to prefetched data. A maximum number of allowed prefetched data portions are predefined, and when this number is exceeded, the cache space corresponding to previously used data portions is reused or recycled to write new prefetched data. U.S. Pat. No. 5,381,539, to Yanai et al., describes a prefetch method that implements an idea of this kind. Moreover, depending on the current overall workload stress in the system, as measured by the FTT in the cache, it is possible to temporarily activate or deactivate the use of the microcache or to change its basic operation parameters. U.S. Pat. No. 5,706,467 to Vishlitzky, et al., describes a cache management scheme that follows this approach.
On the other hand, it is also desirable to ensure that data portions that were prefetched will not be replaced by other data prematurely, for example, before the host had a chance to use that data. One way of achieving this goal may include, by way of example, defining a certain prefetch stop condition applying to read requests. Thus, for a certain cache space may be exclusively allocated to store prefetched sequences, and may be configured to prevent additional prefetch activity once the cache space exclusively allocated to store the prefetched sequences has been fully populated by data portions not yet read by the host. This and similar algorithms are described in U.S. Pat. No. 5,983,324, to Ukai et al.
Yet another, related problem to be taken into consideration when implementing cache management algorithms concerns the handling of “write-pending data”. Write pending data is data which had been written by the host into cache, and already acknowledged by the system, but which has not yet been destaged into the permanent media. In general, cache space allocated to data portions that are write-pending should be preferably returned to the main memory pool as quickly as possible in order to free up cache storage space for data portions that the host is requesting, and that typically are considered of a higher level of priority. On the other hand, it is possible that these write-pending data portions might themselves be candidates for being requested by the host within a short period of time, in which case it may be desirable to retain these write-pending data portions in memory until they are actually read. Based on statistics, such as the current FTT of the cache and the number of times that a given data portion in cache has been accessed over a period of time, for example, it is possible to define algorithms that are capable of efficiently managing the cache replacement procedure taking into consideration the specific factors associated with an the replacement of write-pending portions which may effect the efficiency of these algorithms. U.S. Pat. No. 5,513,336 to Vishlitzky, et al., already mentioned above, describes cache management schemes that follow this approach.
The existing cache management procedures demand significant overhead operations and procedures in the storage system, which may cause the overall performance of the storage system to diminish. Such overhead activity or procedure may entail continuous operations which may be necessary to obtain and maintain system-wide statistics and to measure parameters associated with the behavior of individual data portions, and, in addition the overhead activity and procedures may produce increased data traffic within the system communication lines. Moreover, cache storage devices that implement global lock operations as part of their design, will incur increased applications of this lock, due to both the prefetch activity and the cache management routines. In cases sequential prefetch operations result in the creation and management of a microcache, such as described above, additional overhead and resource use is added to the cache manager or cache controller.
There is therefore a need for a method, a system and a circuit for implementing a cache replacement procedure requiring relatively low overhead. There is a further need for a method, a system and a circuit for implementing a cache replacement procedure wherein analysis is performed locally and wherein the analysis requires only the examination of the requested data-portion, independently of its relations with other portions in the system. There is yet a further need for a method, a system and a circuit for implementing a cache replacement procedure, wherein at least a portion of the global statistics, comparison and internal information transmission are not necessary for the implementation of the cache replacement procedure.