1. Field of the Invention
The present invention relates, in general, to cache memory and methods for using cache memory, and, more particularly, to a method and system that uses an extent-based cache memory management.
2. Relevant Background
Data processing systems rely on a variety of data storage mechanisms for storing data and program code. Each storage mechanism has an associated latency because of delay incurred in writing data to and reading data from the storage device. Storage mechanisms range from low latency mechanisms such as static random access memory (SRAM) located physically near data processing elements to magnetic, optical and remote storage mechanisms with latencies that are several orders of magnitude larger than SRAM. Mass storage devices tend to have greater latency than working memory located physically and logically close to a data processor.
There is a continuous need for techniques that can enhance performance without significantly increasing the cost and complication of a design. Caching is one technique implemented to improve performance of data storage systems. Cache technology hides latency associated with mass storage such as magnetic and optical disk storage devices. Cache technology involves providing a relatively low latency memory device between a relatively high latency memory storage device and a host device. The memory device, organized as cache, stores write and/or read data so that subsequent read/write commands might be satisfied with the data in the cache rather than the high latency storage device. Depending on the writeback policy in effect, as determined by hardware or software, write operations may or may not be cached. Moreover, cache management hardware/software may designate only portions of the high latency memory storage device to be cacheable while other portions are designated uncacheable. Transferring data to and from the relatively low latency buffer cache memory instead of the much slower, high latency storage device cuts down on transfer time and boosts the speed of the system.
The data transfer time savings of cache technology increases when an increasing percentage of read/write requests are satisfied with just the data stored in the cache memory. A successful transfer that exclusively uses data from cache memory to satisfy a request is called a xe2x80x9chitxe2x80x9d. Conversely, a xe2x80x9cmissxe2x80x9dxe2x80x94also called a xe2x80x9ccache missxe2x80x9dxe2x80x94occurs when additional data from the relatively higher latency data storage device is required.
Design principles that guide the design of cache memory include reducing cache misses, and efficiently allocating cache memory for nonsequential commands. Reducing cache misses increases the ratio of hits to the total blocks of transferred data (known as the hit rate). The higher the hit rate, the more likely that an access request from a host device, such as a central processing unit (CPU), is filled by the low latency cache memory instead of high latency data storage device (e.g., a hard disk).
Cache misses are classified into various types including compulsory misses, capacity misses, conflict misses and coherence misses. Compulsory misses occur when data is brought into the cache memory for the first time. Increasing the size of the post-request speculative read reduces compulsory misses by taking advantage of spatial locality properties of stored data. Capacity misses result from the fact that the cache memory is always smaller than the higher latency store being cached. Increasing the total cache size reduces capacity misses. Conflict misses occur when two or more backing locations map to the same place in the cache, requiring that old cache data be replaced and then brought back later. Increasing the associativity of the cache lines reduces conflict misses. Coherence misses are misses that would not otherwise occur except for invalidation to preserve multiprocessor cache consistency.
Write-back caching reports a write command as completed to the host issuing the write request when the write command data enters the buffer cache memory. The actual completion of the write command (i.e., committing data to the high latency data storage device) may be delayed until conditions are optimized for transferring the data from the buffer cache memory to the higher latency data storage device. In contrast, xe2x80x9cwrite-throughxe2x80x9d caching automatically executes the write command while simultaneously copying data to the buffer cache memory. Write-through caching is often less efficient than write-back caching, but are more reliable in some applications.
Yet another cache memory design principle considers the efficient processing of both sequential and non-sequential access patterns. In the context of disk drive caching, when two commands refer to a contiguous range of logical block addresses (LBAs), the disk drive designates one command to be sequential with the other command. For example, a first command designating an LBA that starts with logical block 512 and spans 512 contiguous logical blocks would be sequential with a second command designating an LBA that starts with block 1024 and also spans a contiguous number of logical blocks. If the second command were not sequential with the first, it may be designated a non-sequential command. An efficient cache memory management system recognizes both types of commands and processes them in an order that maximizes the data transfer rate.
Cache memory systems have traditionally taken one of two approaches: the first is called a tagged memory data structure and the second is called a segmented memory data structure. Tag-managed caches are typically found in systems that cache main memory of a processor. In contrast, segmented memory structures are often used in disk drive cache systems where cache performance has traditionally been less critical.
Tag cache memory systems employ a tag memory data structure to manage the contents of the cache. The cache comprises a plurality of cache lines. Each cache line is associated with a tag that points to the location of a logical block within the cache. In tag cache memory systems, part of the LBA is used to address the tag memory structure. Each tag line stores the rest of the LBA to indicate which blocks are stored in the cache. The size of the tag memory places a limit on how precisely logical blocks can be indexed in the cache memory.
Cache line tags are grouped into sets that are searched to locate a particular logical block or blocks. In direct mapped caches, the cache contains just a single cache set. Any particular LBA can map to only one particular cache line. If a logical block is not identified in one of the cache line tags, it will not be found in that set, and a cache miss occurs. Also, when two or more LBA""s that are currently in use map to the same cache line the cache line will be continuously evicting and reloading the cache line in a condition sometimes referred to as xe2x80x9cthrashingxe2x80x9d. This increases the rate of conflict misses.
At the other extreme, in a fully associative cache, each cache line is a set. Hence, each LBA can map to any available cache line. In this cache structure, searching the sets is guaranteed to produce a cache hit if the logical block is in the cache. In a fully associative cache, cache lines may be evicted based solely on objective criteria, such as least recently used algorithms, and thereby minimizing conflict misses. However, searching each set is more complicated and possibly more time consuming than searching a single set with a single cache line tag.
Between the extremes of direct mapped caches and fully associative caches are set-associative caches. In set-associative caches, a plurality of cache lines are included in each set. Any given LBA can map to a cache line within any set. Hence, conflict misses are not minimized, but they are held at an acceptable level. Set-associative caches have fewer address clashes than direct-mapped caches, and require less complex and possibly less time consuming search operations than fully-associative caches.
Segmented buffer memory systems allocate the available memory to a set of segments. Each segment holds a span of contiguous logical blocks of data equal to or smaller than the size of the segment. The size of the segments may be statically or dynamically allocated, depending on the complexity of the cache memory management structure. Typically, the number of segments is fixed to the maximum number supported by the available cache memory hardware. Some systems, however, use complex software controls to dynamically change the number of segments in the cache memory.
Segments are circular queues for data buffering between a host and a relatively higher latency data storage device. In addition to the inherent rate matching function of segments, they contain the most recent access sequence and therefore perform a caching function as well. Some operations performed by segmented cache memory systems include re-allocation of segments for read/write access, and performing coherent read hit detection across an active set of segments.
Unfortunately, traditional tagged and segmented memory structures are poorly suited for handling rapid, complex access request sequences from modern host devices. Traditional tagged cache memory structures rigidly define how data is mapped between cache and logical blocks in the high latency data devices. The restrictions on mapping create inefficiencies in the performance of the cache. Furthermore, segmented cache memory structures typically predetermine the size of the data set in the cache, regardless of the actual pattern of access requests from a host device. Consequently, the fixed sized segmented cache memory is rarely well matched to satisfy the dynamically changing access request pattern from the host device.
Briefly stated, the present invention includes a cache system and method for an extent-based cache memory design. A method of the present invention for caching data in a memory comprises the steps of: providing a storage device and a host device, where each device is in communication with the memory; creating an extent record associated with the memory; receiving a storage device access request from the host device; and changing at least one state field value in the extent record in response to the access request from the host device.
In a preferred aspect, the method of the present invention also comprises allocating an extent within the memory and associated with the extent record, where the size of the extent is allocated based on the access request plus any additional space for speculation and operations before and/or after the requested span of LBA""s.
In another preferred aspect, the changing of at least one state field value in the extent record in response to the access request comprises the steps of incrementing a hit count each time a target logical block is identified in the cache, and then decrementing the hit count after each target logical block is read from the memory to the host device, where the target logical blocks are not re-allocated in memory until the hit count is fully decremented. In this preferred aspect, target logical blocks are logical blocks identified in the cache as satisfying a read request from the host device.
In yet another preferred aspect, the changing of at least one state field value in the extent record in response to the access request comprises the steps of incrementing a dirty count after a logical block is written from the host device to the memory and then decrementing the dirty count after the logical block is written from the memory to the storage device, wherein the logical block is not re-allocated in the memory until the dirty count is fully decremented.
Another aspect of the present invention includes an extent record-managed cache memory that comprises a memory in communication with a storage device and a host device, an extent record associated with the memory, and at least one state field value in the extent record that changes in response to an access request from the host device.
Yet another aspect of the present invention includes a storage device that comprises a higher latency data storage component, a lower latency memory in communication with the higher latency data storage component and a storage device, an extent record associated with the lower latency memory, and at least one state field value in the extent record that changes its value in response to an access request by the host device.
Still another aspect of the present invention includes a data storage system that comprises a higher latency data storage component, a host device, a lower latency memory that is in communication with the higher latency data storage component and the host device, an extent record associated with the lower latency memory, and at least one state field value in the extent record that changes its value in response to an access request from the host device.