The present disclosure relates in general to the field of computer systems, and, more particularly, to a system and method for caching data retrieved from one or more storage devices.
Computer networking environments such as Local Area Networks (LANs) and Wide Area Networks (WANs) permit many users, often at remote locations, to share communication, data, and resources. A storage area network (SAN) may be used to provide centralized data sharing, data backup, and storage management in these networked computer environments. A SAN is a high-speed subnetwork of shared storage devices. The combination of a LAN or WAN with a SAN may be referred to as a shared storage network. A storage device is any device that principally contains a single disk or multiple disks for storing data for a computer system or computer network. Because these storage devices are intended to serve several servers, these storage devices are typically capable of storing much more data than the hard drive of a desktop computer. The collection of storage devices is sometimes referred to as a storage pool. The storage devices in a SAN can be co-located, which allows for easier maintenance and easier expandability of the storage pool. The network architecture of most SANs is such that all of the storage devices in the storage pool are available to all the servers on the LAN or WAN that is coupled to the SAN. Additional storage devices can be easily added to the storage pool, and these new storage devices will also be accessible from any server in the larger network.
In a computer network that includes a SAN, the server can act as a pathway or transfer agent between the end user and the stored data. Because much of the stored data of the computer network resides in the SAN, rather than in the servers of the network, the processing power of the servers can be used for applications. Network servers can access a SAN using the Fibre Channel protocol, taking advantage of the ability of a Fibre Channel fabric to serve as a common physical layer for the transport of multiple upper layer protocols, such as SCSI, IP, and HIPPI, among other examples.
The storage devices in a SAN may be structured in a redundant array of independent disks (RAID) configuration. When a system administrator configures a shared data storage pool into a SAN, each storage device may be grouped together into one or more RAID volumes and each volume is assigned a SCSI logical unit number (LUN) address. If the storage devices are not grouped into RAID volumes, each storage device will typically be assigned its own LUN. The system administrator or the operating system for the network will assign a volume or storage device and its corresponding LUN to each server of the computer network. Each server will then have, from a memory management standpoint, logical ownership of a particular LUN and will store the data generated from that server in the volume or storage device corresponding to the LUN owned by the server.
A RAID controller board is the hardware element that serves as the backbone for the array of disks. The RAID controller relays the input/output (I/O) commands or read/write requests to specific storage devices in the array. The RAID controller provides the physical link to each of the storage devices so that the disks may be easily removed or replaced. In order to provide greater fault tolerance, the RAID controller also serves to monitor the integrity of each storage device in the array to anticipate the need to move data in the event of a faulty or failing disk drive.
RAID controllers may also cache data retrieved from the storage devices. RAID controller support for caching may improve the I/O performance of the disk subsystems of the SAN. RAID controllers generally use read caching, read-ahead caching or write caching, depending on the application programs used within the array. For a system using read-ahead caching, data specified by a read request is read, along with a portion of the succeeding or sequentially related data on the drive. This succeeding data is stored in cache memory on the RAID controller. If a subsequent read request uses the cached data, access to the drive is avoided and the data is retrieved at the speed of the system I/O bus. Read-ahead caching is ideal for applications that store data in large sequential records, such as video image processing. However, read-ahead caching is ill-suited for random-access applications, such as transactional or database applications. In random-access applications, read requests are usually not sequentially related to previous read requests. As a result, if most of the SAN storage applications are random-access applications, the data read for caching purposes rarely results in a cache hit.
RAID controllers may also use write caching. Write-through caching and write-back caching are two distinct types of write caching. For systems using write-through caching, the RAID controller does not acknowledge the completion of the write operation until the data is written to the drive. In contrast, write-back caching does not copy modifications to data in the cache to the cache source until absolutely necessary. The RAID controller signals that the write request is complete after the data is stored in the cache but before it is written to the drive. The caching method improves performance relative to write-through caching because the application program can resume while the data is being written to the drive. However, there is a risk associated with this caching method because if system power is interrupted, any information in the cache may be lost.
To improve cache hit rates on random access workloads, RAID controllers typically use cache algorithms developed for processors, such as those used in desktop computers. Processor cache algorithms generally rely on the locality of reference of their applications and data to realize performance improvements. As data or program information is accessed by the computer system, this data is stored in cache in the hope that the information will be accessed again in a relatively short time. Once the cache is full, an algorithm is used to determine what data in cache should be replaced when new data that is not in cache is accessed. Generally, a least recently used (LRU) algorithm is used to make this determination. Because processor activities normally have a high degree of locality of reference, this algorithm works well for these applications. It is not unusual to observe processor cache hit rates of 90% or greater.
However, secondary storage I/O activity rarely exhibits the degree of locality for accesses to processor memory. The effectiveness of processor based caching algorithms can be very low for RAID controllers. The use of a RAID controller cache that uses processor based caching algorithms may actually degrade performance in random access applications due to the processing overhead incurred by caching data that will not be accessed from the cache before being replaced. As a result, conventional caching methods are not effective for storage applications. Some storage subsystems vendors increase the size of the cache in order to improve the cache hit rate. However, given the associated size of the SAN storage devices, increasing the size of the cache may not significantly improve cache hit rates. For example, in the case where 64 MB cache is connected to twelve 32 GB drives, the cache is only 0.0175% the size of the associated storage. Even if the cache size is doubled, increasing the cache size will not significantly increase the hit ratio because the locality of reference for these systems is low.
As discussed above, many I/O access patterns for disk subsystems exhibit low levels of locality. However, while many applications exhibit what may be characterized as random I/O access patterns, very few applications truly have completely random access patterns. The majority of data most applications access are related and, as a result, certain areas of storage are accessed with relatively more frequency than other areas. The areas of storage that are more frequently accessed than other areas may be called xe2x80x9chot spots.xe2x80x9d FIG. 1 shows I/O access patterns as a function of disk address and time. For purposes of illustration, the disk is divided into five sections of disk addresses. When viewed for only a short period of time, from time t0 to t1 for example, I/O accesses 32 are basically random and do not exhibit a pattern that may be exploited for caching purposes. However, when viewed over a period of time, one may observe that I/O access patterns are more dense in certain areas of storage than other areas. In this case, I/O access patterns occur more frequently in the zone 34 corresponding to disk address section 1 during the time period of time t0 to t6. Thus, section 1 may be considered a hot spot during this time period because data is being accessed more frequently in this area of storage in comparison to other areas. For example, index tables in database applications are generally more frequently accessed than the data store of the database. Thus, the storage areas associated with the index tables for database applications would be considered hot spots, and it would be desirable to maintain this data in cache. However, for storage I/O, hot spot references are usually interspersed with enough references to non-hot spot data such that conventional cache replacement algorithms, such as LRU algorithms, do not maintain the hot spot data in cache long enough to be re-referenced. Because conventional caching algorithms used by RAID controllers do not attempt to identify hot spots, these algorithms are not effective for producing a large number of cache hits.
In accordance with teachings of the present disclosure, a system and method for replacing cached data retrieved from one or more storage devices in a computer system is disclosed that provide significant advantages over prior developed systems.
The storage devices are divided into a plurality of areas or bins. Each bin is preferably the same size. A Window Access Table (WAT) is an array stored in memory that contains all the time windows for each bin. Each time window holds a frequency value corresponding to the number of times the bin has been accessed during the time period corresponding to that time window. A hot spot algorithm is used to calculate a hot spot value hsf(x) for each bin based on its associated frequency values listed in the WAT. The hot spot algorithm uses scaling coefficients to weight the frequency values based on the time window. Thus, the current time window may be weighted more heavily than older time windows in determining the hotpot value hsf(x) for a particular bin. Each line in cache will therefore have an associated bin for which a hot spot value hsf(x) has been calculated. This data may be stored in a hot spot table. The hot spot table may be a separate table or stored in the WAT.
When data is retrieved from a storage in response to a cache miss, a memory controller, such as a processor or RAID controller, will compare the hot spot value hsf(a) of the bin associated with the new data to the lowest hot spot value hsf(z) in the hot spot table. If hsf(z) is greater than hsf(a), then this indicates that bin (z), the bin with the lowest hot spot value has a weighted access frequency greater than bin (a), the bin containing the retrieved data. Thus if hsf(z) is greater than hsf(a), then the cache line containing data from bin (z) will not be replaced. If hsf(a) is greater than hsf(z) then the new data will replace the cached data from bin (z). The WAT table is updated after the I/O access.
A technical advantage of the present invention is that the cache replacement algorithm is based on frequency of use and is able to track hot spot data longer than least recently used algorithms or similar cache replacement methods. As a result, the present invention is well suited for applications that exhibit from low levels of locality, such as applications utilizing several large storage devices with random I/O access patterns. The present invention also eliminates stale data from cache while retaining cached data that has the potential to produce cache hits over a selected period of time. Other technical advantages should be apparent to one of ordinary skill in the art in view of the specification, claims, and drawings.