1. Field of the Invention
The present invention relates to disk time-sharing apparatus and method for scheduling the use of a disk apparatus on the basis of a plurality of inputs and outputs and, more particularly, to disk time-sharing apparatus and method for scheduling the use of a disk apparatus so as to sequentially switch allocating time for inputs and outputs which compete.
2. Description of the Related Arts
Hitherto, in a storage system for managing data by using a disk apparatus such as a hard disk drive or the like, for example, the disk apparatus is constructed so as to have an RAID structure, the RAID apparatus is connected subordinate to a disk control apparatus, thereby processing an input/output from an upper host, or the RAID apparatus is directly connected to a server, thereby processing an input/output from a server OS. In such a storage system, in the case where it is necessary that a random access in which a guarantee of a response time is required and a sequential access in which importance is attached to an amount of processes per unit time are performed for the same disk apparatus, the operation is performed in a time-sharing manner lest the random access and the sequential access compete. For example, an OLTP (On Line Transaction Processing) in which the random access is mainly performed is executed to a database of the disk apparatus in the daytime and a backup of the database is performed at night after the processing.
(Resource distribution of random access and sequential access)
In the storage system, however, in association with the realization of a non-stop processing, the OLTP processing of the random access system needs to be continued even at night, so that it is necessary to execute the backup as a sequential access during the OLTP processing of the random access system. In case of only the random access, an IOPS (Input Output Per Second) such as 100 IOPS as the number of inputting/outputting times per unit time which can satisfy a mean certain response time, for example, 30 milliseconds can be estimated. In case of only the sequential access, a throughput such as 20 MB/sec can be estimated. However, when the random access and the sequential access are simultaneously performed, since received input/output requests are processed by a queue using an FIFO, there is no mechanism to guarantee a period of time during which the random access can use the disk apparatus and a period of time during which the sequential access can use the disk apparatus. For instance, even when the random access of 50 IOPS at mean response time of 30 milliseconds and the sequential access of 5 MB/sec are desired, if the sequential access frequently occurs, the throughput of the sequential access rises from 5 MB/sec to 10 MB/sec although it doesn""t need to rise. On the contrary, the IOPS which satisfies the mean response time of 30 milliseconds in the random access deteriorates from 50 IOPS to 25 IOPS although the user doesn""t want to reduce it.
(Resource distribution between logic volumes)
In the conventional storage system, by arranging data having different performance requirements to the different disk apparatuses, their performance characteristics are drawn out. For example, data in which a guarantee of a response time is required in the random access of a small amount of data and data in which importance is attached to a processing amount per unit time in the sequential access of a large amount of data are arranged in the different disk apparatuses. In association with the realization of a large capacity of the disk apparatus, however, the case where the data having different performance requirements is arranged in the same disk apparatus is increasing. A similar problem occurs even when the logic volumes having different performance requirements are arranged to the same disk as mentioned above. Hitherto, there is not a mechanism for controlling a disk resource distribution between logic volumes by scheduling the received inputs/outputs by the FIFO. Therefore, when the input/output to/from a certain logic volume frequently occurs, input/output performance for the other logic volume deteriorates. For instance, in the case where a volume A in which it is desired to guarantee 10 IOPS and a volume B in which it is desired to guarantee 50 IOPS are arranged on the same disk, when the access to the volume A frequently occurs, the IOPS of the volume A rises from 10 IOPS to 20 IOPS though it doesn""t need to rise. On the contrary, the IOPS of the volume B deteriorates from 50 IOPS to 40 IOPS though it is not desired to deteriorate it.
(Resource distribution between normal process and backup/copying process)
A case where a plurality of logic volumes exist on the same disk apparatus in the conventional storage system and a backup or a copying operation is performed on each logic volume unit basis will now be considered. Hitherto, in order to suppress the influence on the normal input/output by the backup/copying process, a method of setting paces (intervals) of the backup/copying process at the time of executing the backup/copying process is used. However, if the copying operation is executed to the volume B on the same disk apparatus as that of the volume A while the volume A is being copied, the duplex copying process is operated simultaneously on the same disk apparatus, so that the influence on the normal input/output is doubled.
(Resource distribution between normal process and rebuilding)
In the RAID apparatus, by making data redundant in a plurality of disk drives, even if a failure occurs in one disk drive, the data can be recovered from the remaining disk drives. In the RAID apparatus, therefore, even if the failure occurs in the disk drive, the ordinary input/output can be continued. A recovery of the data is performed to the exchanged disk drive from the remaining disk drives. The recovering process is called xe2x80x9crebuildingxe2x80x9d. Since the rebuilding is accompanied with the input/output process for the disk drives constructing the RAID apparatus, the rebuilding and the normal input/output scramble for the same disk drive. Consequently, the performance of the normal input/output is deteriorated by the rebuilding. For example, in case of RAID1 having a mirror construction, the rebuilding is a process for copying data from one disk drive which remains due to the failure of the other disk drive to the exchanged new disk drive and a read input/output occurs to the disk drive on the copying source side. The read input/output causes the normal input/output to wait, so that the performance of the normal input/output deteriorates. There are two conventional approaches to solve the problem. According to the first approach, enough small data is copied at an enough long interval so as not to exert an influence on the normal input/output. In this case, although the influence on the normal input/output can be reduced, time that is required until the rebuilding is completed becomes long. For instance, in case of RAID1 constructed by disk drives of 9 GB, time of about 10 hours is needed. As for the second approach, when the disk drive is vacant, namely, when the disk drive is not used in the normal input/output, the input/output of the rebuilding is scheduled. A problem in this case is a point that the time that is required until the completion of the rebuilding cannot be guaranteed. When the disk drive is hardly vacant, long time is needed for the rebuilding.
(Guarantee of maximum response time)
In a mission critical processing, as requirements of the input/output performance, the maximum response time is important in addition to the mean response time. The recent disk apparatus has a re-ordering function for rearranging inputs/outputs for which the execution is waited so as to minimize the processing time. The re-ordering function is a function such that an input/output to minimize a positioning time that is defined by the sum of a seeking time and a rotation waiting time is selected as an input/output to be subsequently executed from the execution waiting inputs and outputs by the disk apparatus. When the input/output is requested to the disk apparatus, a simple task serving as a task designation indicating that it can be set as a target of the re-ordering is notified to the disk apparatus. In case of the inputs/outputs of the simple task designation, the disk apparatus schedules the inputs and outputs in order so as to minimize the positioning time. Consequently, the mean processing time at the time of the random access is reduced. For instance, the mean processing time of the random access is reduced from 9 milliseconds to 5 milliseconds by using the re-ordering function. Although the re-ordering function improves the throughput of the disk apparatus as mentioned above, there is a problem that the maximum response time increases. This is because since the input/output to minimize the positioning time is selected as a next input/output, a phenomenon such that a certain input/output is kept waiting for a long time without being scheduled occurs. To solve such a phenomenon, the disk apparatus has a function to designate an ordered task in addition to the simple task to designate that the input/output can be set as a target of the re-ordering. When the input/output is requested by the designation of the ordered task, the disk apparatus completes all of the inputs and outputs which have been received so far but are not completed yet and, after that, schedules the input/output of the ordered task. In this manner, by mixing the ordered task between the simple tasks, it is possible to suppress the extension of the maximum response time of the input/output. However, in case of considering the resource distributions between the random access and the sequential access, between the logic volumes, between the normal process and the backup/copying process, and between the normal process and the rebuilding process, in addition to the use of the simple task to improve the throughput (IOPS), the guarantee of the maximum response time in case of using the simple task becomes a problem.
According to the invention, there are provided disk time-sharing apparatus and method which can guarantee the minimum value of performance when a plurality of different kinds of inputs and outputs to/from a disk apparatus compete with each other.
A disk time-sharing apparatus according to the invention comprises: a disk apparatus having one or a plurality of disk drives; an input/output request unit for issuing an input/output request to the disk apparatus; and an input/output scheduling unit for forming input/output groups obtained by grouping input/output sources to the disk apparatus, defining a ratio of time during which each input/output group uses the disk apparatus, deciding a quantum (allocating time) during which each input/output group can continuously use the disk apparatus on the basis of the defined time ratio, and in the case where the input/output requests are received from a plurality of input/output groups to the disk apparatus, performing a time-sharing such that the disk apparatus is used by sequentially switching the quanta among the competing input/output groups. When there is an input/output request only from one input/output group, the input/output scheduling unit enables the disk apparatus to be continuously used for the input/output from one input/output group. As mentioned above, according to the disk time-sharing apparatus of the invention, the minimum value of the input/output performance can be guaranteed every input/output group which has previously been defined and, when the requests from a specific input/output group are concentrated for a certain time zone, the maximum performance can be guaranteed for the specific input/output group.
Specifically speaking, the input/output scheduling unit makes the input/output determined to be a sequential access correspond to a sequential access input/output group, makes the other inputs/outputs correspond to a random access input/output group, and performs the time-sharing of the disk apparatus by the sequential access and the random access. Therefore, no matter how many random access requests are generated, since the time during which the disk apparatus can be used by the inputs/outputs of the sequential access is guaranteed, the minimum value of the sequential access performance can be guaranteed. Since the time during which the disk apparatus can be used by the inputs/outputs of the random access is guaranteed, the minimum value guarantee of the random access performance can be performed. In case of only the sequential access request, since the disk apparatus can be successively used only by the input/output request of the sequential access, the maximum performance of the sequential access can be guaranteed. Further, in case of only the random access request, since the disk apparatus can be continuously used only by the input/output request of the random access, the maximum performance of the random access can be guaranteed.
The input/output scheduling unit makes a plurality of logic volumes in which performance requirements are the same correspond to one input/output group and performs a time-sharing of the disk apparatus among logic volume groups in which performance requirements are different. Therefore, since the time during which the disk apparatus can be used by the input/output of an access to a certain logic volume is guaranteed, no matter how many input/output requests for the other logic volumes are generated, the minimum value of the input/output performance of each logic volume can be guaranteed. In case of only the input/output request for a certain logic volume, since the disk apparatus can be used continuously only by the input/output request of such a volume, the maximum performance of the input/output to this volume can be guaranteed.
The input/output scheduling unit of the disk time-sharing apparatus makes the inputs/outputs of the copy and backup processes correspond to one input/output group and performs a time-sharing of the disk apparatus between the copy and backup processes and the other process. Thus, even if the copy/backup processes operate on the same disk apparatus in an arbitrary multiplexing state, since the disk using time during which the disk apparatus can be used by the input/output of an ordinary process is guaranteed, the minimum value of the input/output performance of the ordinary process (process other than the copy/backup) can be guaranteed. Since the using time of the disk apparatus which can be used by the copy/backup processes is guaranteed, the minimum value of the accessing performance of the whole copy/backup processes can be guaranteed. In case of only the inputs/outputs of the copy/backup processes, since the disk apparatus can be used continuously only by the inputs/outputs of the copy/backup processes, the maximum performance of the copy/backup inputs/outputs can be guaranteed.
If the disk apparatus has an RAID construction such that it has a plurality of disk drives and even if one disk drive fails, data can be restored and rebuilt from another disk drive, the input/output scheduling unit makes the input/output of the rebuilding process of the disk apparatus having the RAID construction correspond to one input/output group and performs a time-sharing of the disk apparatus between the rebuilding process and the other process. Therefore, since the time during which the disk drive can be used by the input/output of the ordinary process is guaranteed, the input/output performance of the ordinary process during the rebuilding operation can be guaranteed. Since the time during which the disk drive can be used by the input/output of the rebuilding is guaranteed, the time that is required until the completion of the rebuilding can be guaranteed. Further, as compared with the conventional apparatus for performing the rebuilding operation at a predetermined interval, according to the disk time-sharing of the invention, since the rebuilding process can be executed when the ordinary input/output is not executed, the time that is required until the completion of the rebuilding can be reduced while the input/output performance of the ordinary process is guaranteed.
If the disk apparatus has an ordered task function such that a plurality of inputs/outputs are scheduled so as to minimize the positioning time by a designation of a simple task (first task) and inputs/outputs of the second task designation are scheduled after completion of the input/output during the reception by a designation of an ordered task (second task), the input/output scheduling unit separately schedules the designation of the simple task and the designation of the ordered task when the time-sharing of the disk apparatus is performed. That is, when the time-sharing in which the disk apparatus is sequentially used among a plurality of input/output groups is performed, the input/output scheduling unit designates the ordered task as for the first input/output just after the switching of the input/output group and, after completion of the inputs/outputs of the group before switching, schedules the inputs/outputs of the group after the switching, and designates the simple task as for the inputs/outputs until the group is switched subsequently and schedules a plurality of inputs/outputs so as to minimize the positioning time. Therefore, in one quantum, the disk apparatus can be used continuously for the inputs/outputs of a certain input/output group. The input/output scheduling unit predicts a processing time of the unprocessed inputs/outputs and calculates (predicts) a next quantum start time T0 when the quantum is switched. Each time the input/output request is received or a completion of the input/output request is responded, the input/output scheduling unit predicts a remaining time Tr on the basis of the processing time of the unprocessed input/output at that time and the quantum start time T0. When it is determined that there is the remaining time (Tr greater than 0), an input/output request of the present quantum is inputted to the disk apparatus. When it is decided that there is no remaining time (Trxe2x89xa60), the present quantum is switched to the next quantum. To share in the benefit of the re-ordering, in the disk input/output scheduling unit, it is necessary to make an environment in which many inputs/outputs are requested to the disk apparatus. Therefore, in case of using the simple task, a plurality of input/output requests are asked to the disk apparatus. Since the disk time-sharing of the invention intends to perform a time-divisional control of the input/output processing time in the disk apparatus, when the input/output request is asked to the disk apparatus, it is necessary to predict the time which is necessary to process a plurality of asked requests by the disk apparatus, and after the present quantum is switched to the next quantum, discriminate whether an input/output of the kind of quantum after the switching is supplied to the disk apparatus or not. Therefore, the remaining time Tr is calculated by the following equations in order to discriminate whether the request asked at present to the disk drive at the time of quantum switching is completed in the next quantum and a new input/output request can be issued or not.
Remaining time Tr=quantum start time T0+quantum allocating time xcfx84xe2x88x92unprocessed I/O processing timexe2x88x92present time
Quantum start time T0=quantum switching time+unprocessed I/O processing time
Unprocessed I/O processing time=the number of unprocessed I/Oxc3x97I/O mean processing time
In case of using the disk apparatus continuously for the inputs/outputs from one input/output group, the input/output scheduling unit designates the ordered task as for the first input/output just after the allocating time is reset, completes the input/output before the resetting, thereafter schedules the inputs/outputs after the resetting, and designates the simple task as for the inputs/outputs until the resetting is performed subsequently, and schedules a plurality of inputs/outputs so as to minimize the positioning time. Even in the case where the quanta of one input/output group continue as mentioned above, the first input/output for the disk apparatus after resetting the quantum in order to reset the quantum start time to the present time is asked by the ordered task, so that an extension of the response time as an obstacle which is caused by re-ordering the disk apparatus can be prevented.
According to the invention, there is provided a disk time-sharing method for an apparatus comprising: a disk apparatus having one or a plurality of disk drives; an input/output request unit for issuing an input/output request to the disk apparatus; and an input/output scheduling unit for scheduling the use of the disk apparatus on the basis of the input/output request. This disk time-sharing method comprises the steps of:
forming input/output groups obtained by grouping inputs/output sources to/from the disk apparatus and defining a ratio of time during which each input/output group uses the disk apparatus;
deciding a quantum xcfx84i (allocating time) during which each input/output group can use the disk apparatus continuously on the basis of the defined time ratio; and
when input/output requests are received from the plurality of input/output groups to the disk apparatus, performing a time-sharing such that the disk apparatus is used by sequentially switching the quanta xcfx84i among the competing input/output groups.
The details of the disk time-sharing method are fundamentally the same as those of an apparatus construction.
The above and other objects, features, and advantages of the present invention will become more apparent from the following detailed description with reference to the drawings.