The present invention relates to a controlling method for disk access commands to a disk device that stores data of a client and relates to a network storage system including a function of the disk access commands.
With progresses in network technologies in recent years, data to be stored in a client computer has been increased drastically. Along with this, a capacity of a direct attached storage (DAS) is getting bigger and a storage system directly connected to a network (network storage) is being introduced.
The DAS is a storage device directly connected to the client computer by using an I/O interface such as an SCSI. Data stored in the DAS can be accessed only through the client computer to which the DAS is connected directly.
Meanwhile, the network storage is a storage system connected to a network. The data can be shared between a plurality of client computers, so that the clients can manage the shared data more efficiently than the DAS. The network storage may be, for example, an SAN storage connected through a storage area network (SAN) to provide a block access, an iSCSI storage connected through an IP network, or an Infiniband, etc. to provide a block access, or a network attached storage (NAS) to provide a file access.
To hold a file in the DAS or iSCSI storage, the client computer first needs to divide a physical storage device (physical disk) into one or more logical disks (partitions) and to format them. Each partition is formatted in the form of a file system supported by an operating system (OS) of the client computer. For example, if the OS of the client computer is Windows (registered trademark), the partition is generally formatted in the form of an NTFS or FAT. If the OS of the client computer is UNIX (registered trademark), the partition is generally formatted in the form of an ext2 or ext3 file system. If the OS is supported even in the form of any other file systems, the partition can be formatted in the file form of the supported OS.
Typically, a file system holds user data as well as information used by the system in order to manage the user data. For example, when the partition is formatted by the NTFS, the OS reserves some areas in advance. In these areas, vacancy information of each disk block in the partition and addresses of the disk blocks required to write real data of the file are written. By using the information held in these areas, the OS manages the user files. Therefore, in such a file system, if a user writes data to the disk device, he needs to write also management data.
In the form of various types of file systems, such a journaling file system has been used often in recent years. The journaling file system records writing history in a log area reserved in advance whenever the OS or user writes data to a disk device. Even if system down has occurred due to a system failure, a data loss is reduced to a minimum because the system can be restarted normally by using the information recorded in the log. In the journaling file system, the OS needs to record a log as needed in response to data writing by the user so that the system can be restored to an original state even if the failure has occurred.
Typically, in such a file system, a disk block area of the disk device which stores the user data and a disk block in the disk device that stores management data to be used by the OS are managed independently of each other. Therefore, in a journaling file system in which the management data including the log is accessed frequently, a disk access to the user data and a disk access to the management data for the OS are occurred concurrently, so that accesses to discrete disk blocks are occurred often. Generally, the disk device can write data to the continual disk blocks in a short time. However, if the disk device writes to the discrete disk blocks, its access time becomes long and consequently its performance is degraded significantly.
As a method of optimizing inefficient accesses to such a disk device, there is disclosed a technology for controlling execution order of disk access commands executed on the disk device by utilizing information kept in the disk device.
For example, Patent Document 1 (Japanese Patent Application Laid-open No. 5-27911) discloses a technique of a disk device that can accept a plurality of disk access commands in a multiplexed manner to hold them in a queue buffer, wherein, with respect to a group of disk access commands queued by the disk device, a movement time of a disk head and a disk rotation time from a current position of the head to a command start sector are calculated; a waiting time is calculated from both times; and a command in which the waiting time is minimized is selected and executed, so that a command processing time is reduced.
Further, Patent Document 2 (Japanese Patent Laid-open No. 6-259198) discloses a technique for speeding up a command processing in a disk device system constituted of a plurality of disk drives. In the Patent Document 2, two kinds of command queues of a reordered queue and a standby queue are prepared so that a command to be executed may be selected from the reordered queue and a newly incoming command may be placed in the standby queue. Accordingly, in the case where the command uses a plurality of disk drives similarly to the RAID, even if one of the disk drives cannot be accessed, any other drives can be accessed, whereby disk access performance can be improved.
Furthermore, Patent Document 3 (U.S. Pat. No. 6,574,676) discloses a scheduling algorithm for changing an execution order of I/O commands queued by a disk drive and also discloses an apparatus applying thereto. By this method, in order to reduce a disk access time comprising a seek time and a movement time from a specific head to a target physical sector, a disk controller roughly calculates an expected access time of each commands with respect to a group of queued commands, and selects and executes the command having the smallest expected access time, whereby an average access time in disk I/O operations can be improved.