The present invention pertains to a method for sorting seek operations in rotating disk drives. More specifically, the present invention relates to a computer program product for placing commands in a queue by grouping proximate commands, thus improving throughput by reducing drive latency and decreasing the number of iterations run by a scheduling algorithm.
Computer systems or other accessories, collectively referred to as xe2x80x9ccomputer systemsxe2x80x9d, generally include data storage devices, such as hard disk drives. A hard disk drive is an electromechanical or an optical-mechanical device that reads from and writes to a hard disk that includes one or more disk platens. The main components of a disk drive are a spindle on which the platens are mounted, a drive motor for spinning the platens, one or more read/write heads, a seek mechanism for positioning the heads over the platens, and a controller which synchronizes read/write commands and transfers information to and from other components of the computer system.
In operation, the computer system provides logical instructions to its disk drive, to read or write data into memory locations onto the disk. Although the instructions typically include a logical address for the data, the data is not stored in logical format; rather, the data is stored in a physical address location. The controller typically translates the logical address into a physical address. Once the translation occurs, the controller directs the heads to the physical address location at which the desired data is stored or read.
The amount of time from the start of the movement of the heads arm until the start of the read or write phase of an I/O command is referred to as the xe2x80x9caccess timexe2x80x9d. Access time is comprised of two components. The first component is the seek and settling time, which is the time required to move a disk drive""s read/write head to a specific track or cylinder on a disk and settling it on the target track. The second component is the rotational latency time, which corresponds to the additional time required for the disk to rotate so that the desired physical address location is located underneath the properly positioned head.
The available rotational time of a command is calculated based on the rotational position of the command and the current position of the head. If there is no chance that the command could be accessed at that time because of the radial distance, this rotational time is repeatedly incremented by one revolution time, until there is a positive probability of a successful access.
Each disk typically includes a plurality of concentric tracks, on one or both surfaces, from which information is read, or onto which information is written by a read/write element. In addition, each track is further divided into a plurality of sectors. A cylinder is formed by a plurality of tracks with the same radial coordinate on the stack of disks. In a disk drive, a disk rotates at a high speed while the read/write element xe2x80x9cfliesxe2x80x9d over the surface of the rotating disk. The read/write element is positioned over specific areas or sectors of the disk in accordance with commands received from the computer system. The numerous commands of the computer system usually exceed the drive""s ability to execute the commands immediately upon receipt, in which case a queue is formed. The set of commands available for execution by the disk drive is referred to as the xe2x80x9ccommand queuexe2x80x9d.
Traditionally, controllers have been developed to reorder the command queue according to a positional sequence. Examples include reducing the number of changes in the direction of the movement of the head, ordering according to the shortest calculated head movement regardless of direction, and more commonly ordering according to the shortest overall access time between successive commands.
Numerous methods of drive scheduling have been devised to minimize the average access time. The conventional rule used by scheduling algorithms has been to choose the next read/write command from its local queue by essentially executing the earliest executable command. There is, however, some uncertainty with regard to the actual time it would take from the end of the currently active command, that is the command being currently executed, until the onset of execution of the next command. In part, this uncertainty is due to the fact that the seek and settling times are not absolutely deterministic. In some cases, due to the variance of the seek and settling time, the head will not be ready to start executing even though the correct rotational position has been attained. Another problem is that even if there were no uncertainty, once the start and end positions are taken into account, still there would not be sufficient time to calculate the precise access time while the scheduling algorithm is scanning the queue of commands.
In the event the actual access time is underestimated, a complete revolution may be lost. A common solution has been to add a xe2x80x9csafetyxe2x80x9d margin (sometimes called a xe2x80x9cfudgexe2x80x9d factor) to the seek and settling time and establish a safe estimate of the time at which execution can start for certain. By adding this safety margin, the scheduling algorithm sometimes bypasses or delays a command if this command is not certain to be executed during the first revolution. Such approach could significantly and adversely affect the throughput of the disk drive.
Another disk scheduling method is illustrated in U.S. Pat. No. 5,570,332 to Heath et al that describes a method to reduce rotational latency in a disk drive by dividing the disk into discrete angular regions. The command queue is then sorted according to commands addressing cylinders or tracks within the angular region having the shortest rotational latency. The sorting algorithm searches the queue for commands addressing physical addresses beginning with those in neighboring angular regions. With each repositioning of the read/write head, the rotational latency of the angular regions from the new head location is reevaluated. However, the time estimates are based on adding safety margins and hence are biased.
Yet another disk scheduling method is exemplified in U.S. Pat. No. 5,664,143 to Olbrich, that describes a method for the rotational position queue to be initially ordered. A first command is chosen and assigned the physical address of its last requested block. Each remaining command in the queue is assigned the physical address of its first requested block. The address differences between each remaining command and the first command are converted into a time difference. The time required for the head to be positioned, the seek time, is subtracted from each time difference. For subtractions resulting in times less than zero an additional amount of time corresponding to a full revolution of latency is added. The commands are then sorted by the smallest time difference, such that the command with the shortest time difference becoming the next command. After the execution of the first command, the command with the shortest time difference is removed from the queue and the next command becomes the first command. The ordering algorithm is then repeated to determine a new next command. Though this scheduling algorithm may have met its objectives, there is nonetheless room for further optimization of expected access seek time by using probabilistic criteria to evaluate commands in the disk scheduling queue.
Still another disk scheduling method is illustrated in U.S. Pat. No. 5,854,941 to Ballard et al., that describes a disk scheduling queue for sorting pending disk I/O commands according to an estimated access time. The estimated access time is calculated from first and second rotational times that are derived from a rotational time table based on logical address and head movement time. Once the command is executed, the rotational positioning algorithm is repeated and the queue is resorted. However, the estimate results in a deterministic value rather than a weighted average that takes into account the probabilities of the possible values.
A more specific problem facing conventional scheduling algorithms relates to a parameter referred to as xe2x80x9cfile start delayxe2x80x9d (FSD). The FSD time includes the scanning time of the scheduling algorithm of the entire queue between every two commands, for example several hundred microseconds (e.g. 500 usec.), since the scheduling algorithm is expected to run between the end time of the current command and the start time of the candidate command. Thus, if the anticipated start time of a candidate command is earlier than the end time of the current command plus the FSD, then the anticipated start time of the candidate command is incremented by one revolution time, and this candidate command may no longer be considered a good candidate to be the next command. The effect of a long FSD is therefore a reduced drive throughput.
In accordance with the present invention, a computer program product is provided as a scheduling algorithm for use in disk drives to place I/O commands in a rotational position queue. The scheduling strategy is implemented by selecting commands based on a probabilistic approach that minimizes the expected next command access time. Thus, the present scheduling algorithm allowa data to be accessed in the shortest expected amount of time possible, maximizes the throughput of the drive and improves the overall performance of the computer system.
The scheduling algorithm of the present invention improves the disk I/O average access time by estimating the expected access time (EAT) for the queued commands, and by selecting these commands so that the command with the least EAT (LEAT) is executed first.
Whereas certain conventional scheduling algorithms rely on rotational latency or appended additional time to compensate for the uncertainty inherent in the seek and settling times, as described earlier, the probabilistic approach of the present invention does not postpone the execution of commands due to this uncertainty, but rather relies upon, and incorporates such uncertainty as a useful criterion in the comparison of commands. An exemplary criterion used in a preferred embodiment of the present invention is the least expected access time.
The least expected access time is a concept which is introduced herein, and which is derived by having the disk scheduling algorithm sort pending disk I/O commands into a disk scheduling queue according to the expected time necessary to reach the target positions on the disk. The probabilistic algorithm weights the possible access times of commands sorted in the disk scheduling queue, and accounts for the probability of the drive executing a command during the first possible revolution as well as the probability of the drive executing the command in the second possible revolution. Both of these probabilities are taken into consideration in reaching a final determination as to the queue order of the commands. This would eliminate the rigid deterministic (e.g. duality of decision) approach followed by conventional scheduling algorithms and allow for taking calculated risks in scheduling commands so as to minimize the long-term average latency.
As an illustration, the scheduling algorithm assigns an Expected Access Time EAT(i) to an ith command as follows:
EAT(i)=(1xe2x88x92p(i))s(i)+p(i)(s(i)+r)=s(i)+r p(i),
where p(i) is the probability that a revolution will be missed, r is the one revolution time, and s(i) is the minimum time it would take to achieve the correct rotational position with nonzero probability of completing the seek and settling. The probability p(i) reflects various types of uncertainties, both intrinsic and resulting from the lack of computational resources. For simplicity purposes, the possibility of missing more than one revolution was neglected, though those skilled in the art could account for this factor without departing from the scope of the present invention.
According to one embodiment, the scheduling algorithm will assign an EAT to each of the commands in the queue. As a result, each of the queued commands will be provided with a single number rather than two numbers as explained above in connection with the conventional deterministic approach. The scheduling algorithm will then reorder the queue commands according to a desired LEAT scheme, for example according to ascending expected access times, so that the command with the LEAT will be executed next.
According to an alternative embodiment, the probability p(i) does not have to be computed for every single command in the queue. Rather, depending on the current best candidate, if a command certainly cannot be accessed faster than the current best candidate, then this command will not be assigned an EAT.
In another embodiment, the scheduling algorithm improves the disk drive throughput, that is the average number of commands per time unit. This is achieved by searching the rotational position queue and by identifying pairs of commands with short access times between them. Once identified, these commands are paired and executed in tandem. Executing a set of commands in tandem increases the drive throughput by reducing rotational latency and decreasing the number of iterations that the scheduling algorithm must run. This embodiment reduces, if not eliminates the xe2x80x9cfile start delayxe2x80x9d (FSD) between proximate commands, which commands would have otherwise been delayed.
Rotational latency is reduced because commands with proximate physical addresses can be executed without waiting for the disk to complete a full revolution (if the scheduling algorithm were run between them), and when they are chosen to be executed in tandem, the access time is relatively much shorter than usual. The present invention offers a departure from conventional scheduling algorithms that are run after the execution of each command. The pairing of proximate commands significantly reduces the total time to execute the queue of commands, and further increases the overall drive throughput. This improvement is expected to be significant when the workload is in a narrow range of the disk, such as the 100 MB test, where the frequency of occurrences of proximate commands is high.
When a new command x arrives at the queue, the tandem identification algorithm checks whether the command x can be made part of a tandem. If the queue includes a candidate command y that can be executed within a predetermined time after the command x, and that predetermined time is less than the FSD or time required to run the scheduling algorithm, then the two commands x and y are paired in tandem to execute y immediately after x without the need to run the scheduling algorithm when x becomes the active command. If the queue includes a command z such that the access time from z to x is sufficiently short, then z and x are paired in tandem to execute x immediately after z. In summary, the tandem identification algorithm allows for the execution of commands with shorter access times that otherwise could not be executed with such short access times and thereby increases the throughput of the drive.
The identification of tandem commands is achieved as follows. The tandem identification algorithm establishes a threshold time for declaring the commands tandem. This threshold time pertains to the physical proximity of the commands as weighted against the time to execute the scheduling algorithm. The tandem identification algorithm then calculates the access times to the new command x from each member z of the queue and from the new command to each member y of the queue. As soon as the algorithm discovers that any of those access times is less than the threshold time, the algorithm declares the respective commands z and x, or x and y to be tandem, and stops searching for any other tandem possibilities that involve the new command x. The tandem commands are subsequently formed into a single command for the purpose of execution, and possible involvement in yet a larger tandem that involves a third command.