1. Technical Field
The present invention is directed to a data processing system. Particularly, the present invention provides a method and apparatus for streaming data from a storage device. More particularly, the present invention provides a method and apparatus for improving the performance of streaming data from mirrored disks.
2. Description of the Related Art
In order to ensure that important data is not lost, it has long been a necessity to maintain backup copies. While this originally meant making copies periodically, e.g., at the end of each week or the end of each working day, mirrored disk techniques make this a completely automatic process.
To implement mirrored disks, the storage controller is informed that two disks are to form a mirrored pair, but will be referred to as a single disk, e.g., as disk F: in FIG. 3. From that point on, when the operating system writes to disk F:, the information is written to both of the mirrored disks. However, when the operating system reads from disk F:, only one of the two disks needs to be read, as they are duplicates.
Because two copies of the data exist, it is theoretically possible to achieve the combined read performance of two disk when reading disk F: by dividing the reads between the two copies. The actual performance achieved depends on the efficiency of the technique used to distribute the reads between the two disks.
A contemporary method to improve mirrored read performance is simply to alternate read requests to one and then the other disk in the mirrored pair. However, tests reveal that for groups of sequential reads (which is one of the standard performance tests for disks), the performance of the mirrored pair does not reach the combined read performance anticipated for both disks. To understand this better, it is helpful to know how data is stored on a disk.
FIG. 1A-C demonstrates several aspects of a disk drive. The disk drive 100 is composed of a number of platters 110, 112, and 114. Platters 110, 112, and 114 are stacked on a spindle (not shown), with space between the platters. Each one of platters 110, 112, and 114 has two sides; each side has a metallic coating that stores data by changing the polarity of tiny magnetized zones retained in the metallic coating of the platter. The polarity is set by a read/write head 120, which rides just above the surface of the platter 110. Read/write head 120 is constructed so that when a current is passed through it, the head can polarize the magnetized zones in the metallic coating under read/write head 120; when there is no current through the head, it will sense the polarity of the magnetized zones. A separate read/write head, such as read/write head 120, is provided for both sides of platters 110, 112, and 114. The heads are collectively mounted on an arm 130, such that all the heads move together from the outer rim of the platters toward the center and back. In the example, with the three platters 110, 112, and 114, there is a total of six heads (only three are shown), but only one head can be active at any given time.
Each platter is divided into tracks 140, which are arranged in concentric bands on each platter, much like the annual rings on a tree. Each track 140 is further divided into sectors 150, which contain a given number of bytes, generally 512 bytes. These sectors are generally numbered sequentially beginning from either the outer or inner track. Normally the outer tracks contain more sectors than the inter tracks.
Once a disk receives a request to read a particular sector or sequence of sectors, the time until the first sector is read depends not only on the rotational speed of the disk, but also on where the platter and track containing the sector(s) is located in relation to where the head is currently positioned. Mechanical movements, such as moving the arm 130 and read/write heads 120 to another track or turning the platter until the proper sector 140 passes under the head, require long mechanical settling and rotational times to lock onto a track and then wait for the starting sector to rotate to the head. These mechanical delays or latencies are collectively referred to as disk seek time. Reading sectors immediately following the first one accessed requires significantly less time than the preparatory mechanical disk seek latencies.
Given the lengthy latencies of these mechanical operations, disks are typically organized in cylinders. A cylinder is composed of all of the tracks having the same relative disk platter position. For our theoretical disk with platters 110, 112, and 114, tracks 161, 162, and 163 are the outermost tracks on the upper side of their respective platter. Together with the three outermost tracks on the reverse sides of platters 110, 112, and 114, they form a cylinder. Since no mechanical movement of the arm 130 is necessary to move between surfaces in one cylinder, it is advantageous to write sequential data records so that they fill the current cylinder before moving to the next cylinder.
Two other latency reducing techniques are typically employed by modern disks. First, modern disks are equipped with their own microprocessor and memory (not shown). One of the tasks of the code executing on the disk microprocessor is to optimize read requests, by sorting their relative order to minimize the number and distance of mechanical movements, thereby reducing total seek time. Secondly, most modern disks will read one or more blocks beyond the in requested read data and store this data in a memory called a cache. This supplementary read operation is typically referred to as read-ahead. If the next read request sequentially follows the last, then the needed data is already waiting in the cache. The second sequential read request is typically satisfied with read data already stored in the cache thereby reducing read time significantly.
For read requests to a single disk, data on the same or consecutive tracks, addressed by linearly increasing block number, has the lowest disk latencies and provides the highest throughput. This is because consecutive blocks are more likely to exist on the same cylinder, and because the data is organized on the disk so that after the first read, reads to consecutive blocks require no seek time, until the end of the cylinder is reached. Additionally, since the disk continuously performs read-ahead, much of the read data is already available in the disk cache when a read request is received. This increases disk throughput even further.
This brings us back to the reason why sending every other sequential read request to one of the two mirrored disks doesn""t result in optimal mirrored disk throughput. When the disk is reading ahead two or more blocks, but receiving requests for every other block, only half of the read data stored in the cache memory is utilized. The result is that disk read-ahead optimization is undermined by issuing every other mirrored read request to one and then the other of the mirrored disks.
Since performance of random reads is known to improve by alternating between the two disks, one current approach is to divert sequential reads to only one disk, and send random reads alternately to each of the disks. The best sequential read performance achieved with this method is equal to that of a single disk.
Another suggested performance optimization for mirrored disks is to use special disk commands (i.e., mode page commands). These commands allow access to internal disk settings and can alter the number of sectors by which the disk reads ahead or change to a different type of caching algorithm. The problem with this solution is that the mode page commands are difficult to use and are not supported consistently by all disks. It is therefore very desirable to have a simple algorithm for reading mirrored disk that enhances sequential read performance without disrupting random and read-ahead disk optimizations.
The present invention discloses an algorithm that can be used in controlling reads to mirrored disks, in order to improve the performance in reading sequential data without degrading random reads. In a first basic embodiment, read requests are directed to a first disk until a threshold number of requests have been sent, then read requests are directed to the other disk in the pair until the threshold is once again reached. In one presently preferred embodiment, the value of the threshold is programmed in firmware in the storage controller card. While any threshold value greater than 1 improves performance over simple toggling, eight to sixteen requests is the currently preferred range, with eight being the presently preferred threshold.
In another embodiment, several additional factors are added to the embodiment above. First, the controller snoops the IO requests to determine if they are sequential, or within a specified number of blocks from the last request. If they are, the controller continues grouping the requests as described above. However, if the conditions are not met, the controller switches to the other disk and begins sending requests to it. Finally, if the controller determines that the current mix of sequential and random requests and/or the size of the requests dictates that a different threshold would better serve the current environment, the threshold number is dynamically changed to be more appropriate.