This invention relates generally to methods for improving performance of direct access storage devices (DASD) such as disk drives. More particularly, it relates to a disk drive cache management scheme that combines the cache and queuing algorithms according to a rotational positioning optimization (RPO) algorithm used in existing DASDs. Decisions are thus made with the goal of optimizing performance by considering both cache and queue structure.
Direct access storage devices (DASD) are commonly used in computers and network servers to store large quantities of digital information. Magnetic disk drives contain a number of flat, round, rotating disks, each of which has two surfaces coated with magnetic material organized into concentric tracks. Data is read from and written to the disks by a transducer or head connected to an actuator that moves the head to a desired track and maintains it over the track during read and write operations. In general, each surface is provided with its own read/write head, and all heads are connected to a common actuator. A typical disk surface 10 containing a number of concentric tracks 12 is illustrated in FIG. 1. Data is stored within a track in sectors 14 containing blocks of fixed size, usually 512 bytes, plus header and trailer information such as error correction data. The location of each sector is uniquely identified by its head, track (on a single surface) or cylinder (vertically aligned tracks of multiple surfaces), and sector. This geometric position is mapped to a logical block address (LBA), an indexing system for the drive. A read/write head 16 is formed on the end of an air-bearing slider 18 and suspended above surface 10 during data transfer. In response to a read or write command sent by a host computer, the actuator 20 moves the read/write head 16 to the proper track and sector defined by the logical block address.
Recently, both computer processor speeds and the volume of information capable of being stored on a hard drive have increased dramatically. As a result, the random input/output performance of disk drives, which has not increased at a comparably high rate, remains a limiting factor in many applications. A variety of metrics are used to describe the performance of disk drives. One important metric is the data access time, a measure of the time to position a read/write head over a particular track and locate the sector or sectors of interest within the track for reading or writing. Data access time is a measure of mechanical performance, i.e., the performance of mechanical functions that are controlled electronically, and thus typically the actual time to transfer data between the head and the disk, known as the data transfer rate, can be neglected. Data access time, measured in milliseconds, is a combination of two factors: seek time and rotational latency.
Seek time denotes the actuator movement time required to reposition the read/write head over the track or cylinder containing the first sector requested by the command. Seek time is a nonlinear function of the number of tracks to be traversed. Average seek times, defined as the time to position the read/write heads between two randomly selected tracks, currently range from 4 to 6 milliseconds.
Once the head is positioned over the appropriate track, it must wait for the sector requested by the command to rotate under it before data transfer can begin. The elapsed time for rotation is known as the rotational latency, which depends upon the disk""s rotational speed. In the worst case scenario, the head reaches the desired sector just after the sector rotates past the head location, in which case the head must wait almost a full rotation before the desired sector is accessed. On average (in a non-RPO environment), the disk must spin one-half rotation before the desired sector is under the head. Average rotational latencies vary from 8.3 milliseconds for a rotational speed of 3600 RPM to 2 milliseconds for a rotational speed of 15,000 RPM. Note that for non-random disk accesses, rotational latencies are significantly lower.
A variety of methods have been employed to reduce the total data access time for a sequence of read and write commands. One method, known as command queue reordering, divides reception of the command sequence from the host controller and execution in the disk drive into two asynchronous processes. Commands are temporarily held in a command queue, where they can be reordered. Each command in the queue contains the instruction for the disk drive to read or write the data to a particular LBA on the disk. Commands consist of the operation type (read or write), starting LBA, and size of command in number of blocks. Commands are uniquely identified by a label, allowing them to be performed in a different order than the one in which they arrive at the controller. The idea behind command reordering is to reorder the commands in the queue to minimize the path length that the mechanical actuator must travel.
In the past, command reordering algorithms aimed to reduce only seek time. For example, the shortest seek time-first ordering algorithm examines all commands in the queue and selects the command with the shortest seek time, in either direction, from the end of the last sector of the currently executed command. The problem with this algorithm is that it completely ignores rotational latency, a significant portion of the data access time. The head might arrive at the next command track, only to find that the required sector had just spun past, requiring the head to stay in position until the next rotation of the disk.
Current command reordering techniques follow a rotational positioning optimization (RPO) algorithm, described in U.S. Pat. No. 5,729,718, issued to Au, and U.S. Pat. No. 5,991,825, issued to Ng. The RPO algorithm takes into account both seek time and rotational latency in reordering the command queue. The total access time for each command in the queue is computed with respect to the ending LBA of the command currently being executed, and the command having the shortest access time is moved to the front of the queue. The access time is calculated as the sum of the maximum of the seek time and head switch time, plus the rotational latency from the arrival point at the new command track to the desired sector location. The RPO algorithm therefore anticipates the above-described problem of arriving at the track just after the desired sector has spun past. RPO algorithms have been shown to increase the overall data throughput of the drive by about 20%.
Another way of improving disk drive performance is by employing a cache buffer memory array (xe2x80x9ccachexe2x80x9d) in the disk controller. The cache provides temporary and limited storage of data blocks in transit between the host computer and storage locations on the disks. The purpose of the cache is to reduce the relatively long access time (e.g., milliseconds) associated with obtaining data from the storage device by maintaining the data in a higher speed memory, which has microsecond access times. The advantage of cache arises from the tendency of applications to make repeated references to the same or adjacent data. A disk drive cache typically has a selectable number of cache slots that are dynamically allocated as either read cache or write cache. When a read data command is executed, the data is both read into the read cache and transferred to the host computer. Subsequent requests for the same data may be fulfilled by the cache, saving significant amounts of time. In the case of write caching, data is stored in the cache before being written to the storage device, allowing parallel host-to-cache and cache-to-disk transfers. When the host computer issues a write command, and the data can be written to the cache, it will be transferred immediately, and the command does not need to enter the command queue. Data accumulated in the cache are subsequently written to the disk in clusters rather than individually, requiring less actuator movement in order to write a number of data blocks. Because cache is of limited size, it eventually becomes full, and newly received data either cannot be added or must replace data currently in the cache. If data is removed from the cache, it must be written to the storage device immediately.
The performance of the disk cache is characterized by hit ratios. A read cache hit occurs when data requested in a read command is found in the cache, eliminating the need for a disk access. A write cache hit occurs when the write command can be satisfied by writing the data to a free location in the cache for later transfer to the disk. The cache hit ratio, which can be defined for the read and write cache separately or for the cache as a whole, is defined as the ratio of the number of cache hits to the total number of commands. Obviously, it is desirable to increase the hit ratio, thereby minimizing the number of disk accesses and overall access time.
Disk caches are managed using algorithms that determine which data to destage (write to the drive) and stage (read from the drive) in order to maximize the cache hit ratio. Efficient algorithms are needed because the disk cache is expensive and therefore typically relatively small, so that only a fraction of read/write data can be stored in cache at a given time. Two standard cache management algorithms are the least recently used (LRU) and most recently used (MRU) algorithms. The LRU algorithm destages data that was either read or written the least recently of all data in the cache. The MRU algorithm retains in cache data that was most recently accessed or added. These schemes, and their numerous variations available in the art, rely on the assumption that data that was recently accessed will be accessed again.
A drawback of most disk cache management algorithms is that they do not seek to maximize the quantity that in part motivates the use of a cache: the decrease in data access time. A caching method that retains in the cache data that is most expensive to retrieve from the DASD is disclosed in U.S. Pat. No. 5,845,318, issued to Rose et al. A value is placed on each piece of data in the cache corresponding to the seek time from the current position of the read/write head and whether or not the cached data has been changed since being read from the DASD. Cached data is replaced if it is relatively inexpensive to access, i.e., if it is close to the current head position and if it does not require an immediate DASD access. The method of Rose et al. has a number of drawbacks. First, it considers only seek time in estimating access time. Second, it does not consider the importance of keeping data in the cache, i.e., whether it will be accessed in the future, independently of its access time. Third, it only considers the current position of the head in estimating access time for cache data. The combination of these three deficiencies makes it likely that the method of Rose et al. will not make the correct cache management decisions in the majority of cases.
There is a need, therefore, for an improved cache management scheme that optimally takes advantage of decreased data access times provided by the disk cache.
Accordingly, it is a primary object of the present invention to provide a cache management method that takes into account the estimated access times of cached data, retaining in cache the data that is most costly to remove from cache.
It is a further object of the invention to provide a cooperative cache and command queue management method that improves both the cache hit ratio and the overall time to data for all commands in the command queue.
It is an additional object of the invention to provide a cache management method that uses a rotational positioning optimization (RPO) algorithm to estimate the access time of cached data.
It is another object of the present invention to provide a method that is flexible and can be adapted to different kinds of data storage systems and performance requirements.
These objects and advantages are attained by a cache management method that makes cache decisions based on the access time of the commands under consideration. If a data block has a high access time, then it is preferentially added to or kept in the cache, while data with a low access time is preferentially not stored in cache. Decisions are made by considering the current state of the command queue. As a result, the total time to data of all commands in the command queue is reduced.
The present invention provides a method for optimizing performance of a data storage system that includes a direct access storage device (DASD) such as a magnetic disk drive, a data cache, and a command queue of commands for accessing data stored on the DASD. The method has the following steps: receiving a data block, either from the DASD during execution of a read command or from a host computer and corresponding to a write command; calculating a cost function C for not storing the data block in the data cache; calculating analogous cost functions for cached data blocks; and replacing a selected cached data block with the received data block, if the selected cached data block has a cost function that is lower than the cost function of the received data block. Preferably, the replaced data block has the lowest cost function of all cached data blocks. If the selected cache data block corresponds to a write command, then it is written to the DASD. The received data block may be received during a read-ahead operation.
The cost function C is given by C=(Tdxe2x88x92Tc)P, where Td is an access time for accessing the data block in the DASD, Tc is an access time for accessing the data block in the data cache, and P is an access probability for the data block. Preferably, Td is calculated according to a rotational positioning optimization algorithm and depends on the logical block address of commands in the command queue. For example, Td may be an average of access times between the data block and each command in the command queue. Preferably, P depends upon the cached data blocks, the relative cache size, performance requirements of the data storage system, or a cache hit ratio.
The present invention also provides a disk controller containing a data cache buffer, a command queue, and means for carrying out steps of the above method.