1. Field of the Invention
The present invention relates to disk time-sharing apparatus and method for scheduling the use of a disk apparatus on the basis of a plurality of inputs and outputs and, more particularly, to disk time-sharing apparatus and method for scheduling the use of a disk apparatus so as to sequentially switch allocating times for inputs and outputs which compete.
2. Description of the Related Arts
Hitherto, in a storage system for managing data by using a disk apparatus such as a hard disk drive or the like, for example, the disk apparatus is constructed as an apparatus of an RAID structure, the RAID apparatus is connected under domination of a disk control apparatus, and an input/output to/from an upper host is processed, or the RAID apparatus is directly connected to a server and an input/output to/from a server OS is processed. In such a storage system, when a random access in which a guarantee of a response time is requested and a sequential access in which importance is attached to a processing amount per unit time need to be performed to the same disk apparatus, the operation is performed in a time-sharing manner so that the random access and the sequential access do not compete. For instance, in the daytime, an OLTP (On Line Transaction Processing) operation in which the random access is mainly performed is executed to a database of the disk apparatus and, in the nighttime after completion of the OLTP operation, the database is backed up.
(Resource Distribution of Random Access and Sequential Access)
In the storage system, however, in association with the realization of non-stop (fault tolerant) of the operation, the OLTP operation of the random access system needs to be continued even in the nighttime, so that the backup as a sequential access needs to be executed during the OLTP operation of the random access system. In case of only the random access, IOPS (Inputs/Outputs Per Second), for example, 100 IOPS, as the number of inputting/outputting times per unit time which can satisfy a certain average response time, for example, 30 msec can be estimated. In case of only the sequential access, a throughput, for example, 20 MB/sec can be estimated. When the random access and the sequential access are simultaneously performed, however, received input/output requests are processed by a queue using an FIFO. Therefore, there is no mechanism to guarantee a period of time during which the random access can use the disk apparatus and a period of time during which the sequential access can use the disk apparatus. For example, even in the case where the random access of 50 IOPS at an average response time of 30 msec and the sequential access of 5 MB/sec are required, if the sequential access is frequently generated, the throughput of the sequential access rises from 5 MB/sec to 10 MB/sec although it is unnecessary to rise. On the contrary, the IOPS to satisfy the average response time of 30 msec in the random access deteriorates from 50 IOPS to 25 IOPS although the user does not want to reduce it.
(Resource Distribution Between Logic Volumes)
In the conventional storage system, by arranging data having different performance requirements to the different disk apparatuses, respective performance characteristics are derived. For example, data in which a guarantee of a response time is requested in the random access of a small amount of data and data in which importance is attached to a processing amount per unit time in the sequential access of a large amount of data are arranged in the different disk apparatuses. However, in association with the realization of a large capacity of the disk apparatus, the number of cases of arranging data of different performance requirements to the same disk apparatus is increasing. Even when logic volumes of the different performance requirements are arranged to the same disk as mentioned above, a similar problem occurs. Hitherto, there is no mechanism in which the received inputs/outputs are scheduled by the FIFO and a disk resource distribution between the logic volumes is controlled. Therefore, when the input/output to/from some logic volume frequently occurs, input/output performance for the other logic volume deteriorates. For instance, in the case where a volume A in which it is desired to guarantee 10 IOPS and a volume B in which it is desired to guarantee 50 IOPS are arranged to the same disk, if the access to the volume A frequently occurs, the IOPS of the volume A rises from 10 IOPS to 20 IOPS although it is unnecessary to rise. Contrarily, the IOPS of the volume B decreases from 50 IOPS to 40 IOPS although the user does not want to reduce it.
(Resource Distribution Between Ordinary Process and Backup/copying Process)
A case where a plurality of logic volumes exist on the same disk apparatus and a backup or copying process is executed on a logic volume unit basis in the conventional storage system will now be considered. Hitherto, in order to suppress an influence on the ordinary input/output by the backup/copying process, a method whereby a pace (interval) of the backup/copying process is set at the time of executing the backup/copying process. However, when the copying process is executed to the volume B on the same disk apparatus as that of the volume A during the copying process of the volume A, the duplex copying processes are simultaneously operated on the same disk apparatus, so that the influence on the ordinary input/output is doubled.
(Resource Distribution Between Ordinary Process and Rebuilding)
In the RAID apparatus, by making data redundant among a plurality of disk drives, even if a failure occurs in one disk drive, the data can be recovered from the remaining disk drives. Consequently, in the RAID apparatus, even when the failure occurs in any disk drive, the ordinary input/output can be continued. The data is recovered from the remaining disk drives to an exchanged disk drive. The recovering process is called xe2x80x9crebuildingxe2x80x9d. Since the rebuilding is accompanied with the input/output process for the disk drives constructing the RAID apparatus, the rebuilding and the ordinary input/output scramble for the same disk drive. Consequently, the performance of the ordinary input/output deteriorates due to the rebuilding. For example, in case of RAID1 having a mirror structure, the rebuilding is a process for copying data from one disk drive which remains due to the failure of the disk drives to the exchanged new disk drive and a read input/output is generated to the disk drive on the copying source side. The read input/output causes the ordinary input/output to be waited, so that the performance of the ordinary input/output deteriorates. Hitherto, there are two approaches to solve the above problem. According to the first approach, enough small data is copied at an enough long interval so as not to exert an influence on the ordinary input/output. In this case, although the influence on the ordinary input/output can be reduced, a time until the completion of the rebuilding becomes long. For instance, in case of RAID1 constructed by a disk drive of 9 GB, a time of about 10 hours is needed. According to the second approach, when the disk drive is available, namely, the disk drive is not used in the ordinary input/output, the inputs/outputs of the rebuilding are scheduled. A problem in this case is that the time until the completion of the rebuilding cannot be guaranteed. When the disk drive is hardly available, it takes long time for rebuilding.
(Guarantee of Maximum Response Time)
In a mission critical operation, as requirements of the input/output performance, the maximum response time is important in addition to the average response time. A recent disk apparatus has a re-ordering function for rearranging execution waiting inputs and outputs so as to minimize the processing time. The re-ordering function is such a function that an input/output to minimize a positioning time that is defined by the sum of a seeking time and a rotation waiting time is selected by the disk apparatus as an input/output to be executed next from the execution waiting inputs/outputs. When the input/output is requested to the disk apparatus, a simple task serving as a task designation indicating that the input/output can be set as a target of the re-ordering is notified to the disk apparatus. In case of the inputs/outputs of the simple task designation, the disk apparatus schedules the inputs and outputs so as to minimize the positioning time. Consequently, the average processing time at the time of the random access is reduced. For example, the average processing time for the random access decreases from 9 msec to 5 msec by using the re-ordering function. Although the re-ordering function improves the throughput of the disk apparatus as mentioned above, there is a problem that the maximum response time increases. This is because since the input/output to minimize the positioning time is selected as a next input/output, such a phenomenon that some input/output is waited for a long time without being not scheduled occurs. In order to solve such a phenomenon, the disk apparatus has a function for designating an ordered task in addition to the simple task for designating that the input/output can be set as a target of the re-ordering. If the input/output is requested by designating the ordered task, the disk apparatus completes all of the incomplete inputs/outputs which have been received so far and, after that, schedules the inputs/outputs of the ordered task. By mixing the ordered task between the simple tasks as mentioned above, the extension of the maximum response time of the input/output can be suppressed. However, in case of considering the resource distributions between the random access and the sequential access, between the logic volumes, and between the ordinary process and the backup/copying process or rebuilding process, in addition to the use of the simple task to improve the throughput (IOPS), it is necessary to also consider the guarantee of the maximum response time in case of using the simple task.
According to the present invention, there are provided disk time-sharing apparatus and method which can guarantee the minimum performance of inputs/outputs when a plurality of different kinds of inputs and outputs compete in a disk apparatus.
A disk time-sharing apparatus according to the invention comprises: a disk apparatus having one or a plurality of disk drives; an input/output request unit for issuing an input/output request to the disk apparatus; an input/output scheduling unit; and an allocating time control unit. Among them, the input/output scheduling unit forms input/output groups obtained by grouping inputs and outputs to/from the disk apparatus, defines a ratio of times during which each input/output group uses the disk apparatus, determines quanta xcfx841, xcfx842, and xcfx843 (allocating times) during which each input/output group can continuously use the disk apparatus on the basis of the defined time ratio, and performs such a time-sharing that when input/output requests are received from a plurality of input/output groups to the disk apparatus, the quanta xcfx841, xcfx842, and xcfx843 are sequentially switched among the input/output groups in which the input/output requests compete, the disk apparatus is used. Further, the allocating time control unit allows the allocating time to be dynamically fluctuated in accordance with a degree of jam of the input/output processing requests of the input/output groups. According to the disk time-sharing apparatus of the invention as mentioned above, in the case where inputs and outputs are divided into a plurality of input/output groups such as random access, sequential access, copy (sequential), and the like and quanta are allocated to them, when input/output requests of a specific input/output group are reduced and a surplus time is caused in its quantum, the surplus time is dynamically distributed to the quantum of the other input/output group which issues many input/output requests. Consequently, it is possible to efficiently access to the disk apparatus without causing an idle time. The allocating time control unit distributes the surplus allocating time of the input/output group having the small number of processing requests by extending the allocating time of the input/output group having many processing requests. The allocating time control unit distributes the surplus allocating time of the input/output group having the small number of processing requests by increasing a frequency without changing its allocating time of the input/output group having the large number of processing requests.
According to the invention, when the surplus time of the quantum occurring in any one of a plurality of input/output groups is distributed to the quantum of the other input/output group having many input/output requests, it is fixedly distributed in accordance with the ratio which was initially set. In the case where the input/output groups are classified into groups for the random access, sequential access, copying access, and the like, however, when it is assumed that a surplus time is caused in the sequential access and the number of input/output requests of the copying access is accidentally large, the surplus time is distributed to the copying access. Consequently, there is a fear that the response time of the random access is sharply extended. According to the invention, the input/output scheduling unit forms two or more upper groups by combining a plurality of input/output groups and the allocating time control unit distributes the surplus time among the input/output groups only in each upper group. For example, the random access and the sequential access are inserted into a normal system group, the copying access is inserted into an abnormal system group, and the distribution of the surplus time between the upper groups is forbidden, so that the surplus time of, for example, the sequential access can be distributed to the random access in the upper groups and its response time can be guaranteed. The allocating time control unit can also distribute the surplus time between the upper groups. In this case, when each of the input/output groups which belong to a certain upper group has a surplus time, the allocating time control unit distributes the surplus time to the input/output group having many processing requests of the other upper group. In the case where a part of the input/output groups which belong to a certain upper group has a surplus time, the allocating time control unit distributes the surplus time to the input/output group having many processing requests of the other upper group. Further, when there is no input/output group which can completely use the surplus time in the upper group on the distributing destination side, the allocating time control unit does not distribute the surplus time to the other upper group. The input/output scheduling unit sets a priority for each upper group. The allocating time control unit sequentially checks a degree of jam of the input/output requests from the upper groups of a high priority and distributes the surplus time of the upper group to the other upper group having many input/output processing requests. The allocating time control unit also sequentially checks a degree of jam of the input/output requests from the upper group of a low priority and can distribute the surplus time of the upper group to the other upper group having many input/output processing requests. Further, the allocating time control unit distributes the surplus time from the upper group of a low priority only to the upper group of a high priority, so that the surplus time occurring by, for example, the copying access of the abnormal system is reflected to the random access or sequential access of the normal system and the response time or throughput can be guaranteed. The allocating time control unit can also distribute the surplus time of the upper group to the other previously designated arbitrary upper group. The allocating time control unit distributes a surplus allocating time of the input/output group which belongs to a certain upper group and has the small number of input/output processing requests by extending the allocating time of the input/output group which belongs to the other upper group and has the large number of input/output processing requests or by increasing a frequency without changing the allocating time of the input/output group.
According to the invention, there is provided a disk time-sharing method for a disk time-sharing apparatus comprising a disk apparatus having one or a plurality of disk drives, an input/output request unit for issuing an input/output request to the disk apparatus, and an input/output scheduling unit for scheduling the use of the disk apparatus on the basis of the input/output request.
According to the disk time-sharing method, input/output groups obtained by grouping inputs and outputs to/from the disk apparatus in accordance with their kinds are formed and an allocating time (quantum) during which each input/output group can successively use the disk apparatus is defined,
when input/output requests are received from a plurality of input/output groups to the disk apparatus, the disk apparatus is used by sequentially switching the allocating times among the input/output groups which compete with each other, and
the allocating time is fluctuated in accordance with a degree of jam of input/output processing requests of the input/output groups.
Also in this case, the surplus allocating time of the input/output group having the small number of processing requests is distributed by extending the allocating time of the input/output group having the large number of processing requests or by increasing a frequency without changing the allocating time of the input/output group having the large number of processing requests. Further, two or more upper groups are formed by combining a plurality of input/output groups and the surplus time is distributed among the input/output groups only in the upper groups or the surplus time is distributed among the upper groups. The other detailed construction is similar to those of the apparatus.
According to another embodiment of the invention, group queues are provided for the input/output scheduling unit in correspondence to input/output groups and a definition table to set a valid flag or an invalid flag of time-sharing is provided every group queue. When the valid flag is recognized on the input/output unit side with reference to the definition table, an input/output request is issued to the input/output scheduling unit. When the invalid flag is recognized, an input/output request is directly issued to the disk apparatus.
When the input/output requests in which the invalid flag is recognized with reference to the definition table continue, the input/output request unit directly issues the input/output request to the disk apparatus and, thereafter, directly issues the next input/output request to the disk apparatus after the elapse of a predetermined idle time. Thus, the user can freely decide the input/output group to which the time-sharing is performed and the input/output group to which the time-sharing is not performed.
According to still another embodiment of the invention, group queues are provided for the input/output scheduling unit in correspondence to input/output groups and, in the case where the end of error is received after the input/output request was extracted from the group queues and issued to the disk apparatus, the input/output request in which the error ended is again stored into the group queue and processed. In this case, the input/output request in which the error ended is again stored to the head or end of the group queue or the input/output request is directly issued to the disk apparatus.
The above and other objects, features, and advantages of the present invention will become more apparent from the following detailed description with reference to the drawings.