The present invention relates to a computer system, and more particularly to a disk file system capable of providing a high performance of input/output operation.
In present computer systems, data requested from by a higher hierarchy such as a CPU is stored in a secondary storage. When it becomes necessary, CPU reads data from, or writes data to, the secondary storage. A non-volatile storage medium such as a magnetic recording medium and an optical disk, typically a disc drive (hereinafter simply called a drive), is used as such a secondary storage.
In a computer system, a secondary storage of high performance has been desired because information processing technology has recently become highly sophisticated. As one of solutions for this, a disk array has been proposed which is constructed of a number of relatively small capacity drives.
Reports on the performance and reliability of disk arrays (levels 3 and 5) are presented in xe2x80x9cA Case for Redundant Arrays of Inexpensive Disks (RAID)xe2x80x9d, by D. Patterson, G. Gibson, and R. H. Kartz, at pp. 109-116, June, 1988. In the disk array (level 3), data is divided and processed in parallel, and in the disk array (level 5), data is distributed and processed independently. A disk array written in this paper is presently considered to be a most general disk array.
A disk array (level 5) will be described in which data is distributed and processed independently. In the level 5 disk array, data is not divided but distributively stored in a number of relatively small capacity drives and processed independently. A secondary storage of a mainframe system presently used is generally a drive having a large capacity. It occurs therefore frequently that while the drive is used by one read/write request, another request is required to stand by until the first request is completely processed. Instead of the large capacity drive used as the secondary storage of a mainframe system, a level 5 disk array uses a number of relatively small capacity drives. It is therefore possible to deal with an increased number of read/write requests because the disk array has a number of drives, thereby shortening a wait time of each read/write request. However, the disk array has a number of disks and hence a number of components so that a possibility of failures increases. To improve reliability, it becomes necessary to use parities.
Data stored in a failed drive can be rebuilt by using parities. A parity is generated from corresponding data and stored in a drive of the disk array, and the data are stored in different drives of the disk array.
Like a presently used general mainframe system, the storage location (address) of data in a disk array of the secondary storage is fixed and CPU accesses this fixed address for data read/write. International Patent WO 91/20076 discloses a method of dynamically translating a write address in units of track for the basic architecture of the level 5 wherein a table of dynamically changeable addresses is provided and compressed data is written.
JP-A-4-230512 discloses a method of writing data and a correspondingly changed parity into different locations of a level 5 disk array. IBM Corp. has announced a level 5 disk array (9337) provided with a WAD (write assist device) (refer to xe2x80x9cNikkei Watcher, IBM Versionxe2x80x9d, Sep. 14, 1992 issue, pp. 14-15.
In a presently used general mainframe system or other systems, the storage location (address) of data in a drive of a secondary storage is fixed and CPU accesses this fixed address for data read/write. A disk array also uses fixed addresses. Fixed addresses pose no problem in the case of a disk array (level 3) in which data is divided and processed in parallel. However, in the case of a disk array (level 5) in which data is distributed and processed independently, fixed addresses result in a large overhead of data write. This will be clarified in the following.
FIG. 11 is a schematic diagram explaining the structure of a RAID of the level 5 in which data is distributed and processed independently, and which has been proposed by D. Patterson and et. al in the above cited paper. Data at each address is a unit processed by one read/write operation, and is independent from other data. In the architecture of RAID, each address of data is fixed. As described earlier, it is essential for this system to use parities in order to improve the system reliability. In this system, a parity is formed from data at the same address of respective drives. For example, a parity is formed from data at the address (1, 1) of the drives #1 to #4 and stored in a parity drive #5 at the address (1, 1). Like a presently used mainframe system, data is accessed from a corresponding drive of this system.
For example, in updating data in the drive #3 at the address (2, 2) of this disk array, the data before update in the drive #3 at the address (2, 2) and the corresponding parity in the drive #5 at the address (2, 2) are first read (indicated by (1)). A new parity is formed from an exclusive logical sum of the read data and new update data (indicated by (2)). After the new parity is formed, the new update data is stored in the drive #3 at the address (2, 2) and the new parity is stored in the drive #5 at the address (2, 2).
As shown in FIG. 12A, the old data and parity are read from the corresponding drives of the level 5 disk array by waiting for half a revolution of drives in average, and then a new parity is calculated. Another one revolution is required to write this new parity, resulting in one and a half revolution in total at a minimum for updating data. A wait time of one and a half revolution is a very large overhead of drives. A method of dynamically translating a write address so as to reduce a data write overhead is disclosed in the above-cited WO 91/20076.
Also disclosed in the above-cited JP-A-4-230512 is a method of writing data in a drive at an address different from the write address in order to reduce a data write overhead. Immediately after the write data is sent from CPU, a parity is updated and written in a drive. As compared to data read, an overhead of generating and wiring a parity is very large. As a result, if CPU requests a large number of read/write requests, a large overhead of dealing with the requests becomes a main factor of lowering the system performance.
It is an object of the present invention to reduce a data write overhead and improve the performance of a level 5 disk array system.
It is another object of the present invention to effectively use drive resources by improving the system performance by a spare drive which rebuilds data in a failed drive.
According to the present invention, a logical group is constituted by drives of a parity group and duplicated fields (space fields). By using the space fields efficiently, a parity update process in the write process can be delayed, and the parity can be generated later when the number of read/write requests by CPU reduces, while maintaining a high reliability.
Specifically, in the write process, data to be written (new data) is duplicately stored in the space fields of the SCSI drives 12 constituting a logical group 12. At this time, a tentative write completion is reported to CPU.
Generating a parity and writing it in a SCSI drive 12 is performed at a timing independent from the timing of writing new data into the SCSI drives 12. Specifically, MP120 of ADC 2 counts the number of read/write requests by CPU relative to the logical group 10. If the number is smaller than that preset by a user or a system manager and if no read/write request is issued presently to the SCSI drive 12, the parity is generated and written in the SCSI drive 12.
In another method of writing a parity, it may be written in response to an interrupt process issued at a predetermined time interval. The times of a day, or days in a month, during which the number of read/write requests by CPU becomes small, may be pre-scheduled.
If a failure occurs at one of the SCSI drives of a logical group 10 prior to the completion of generating a parity and writing it in a SCSI drive 12 and the data in the SCSI drive 12 cannot be read, this data can be rebuilt by the old parity and the data in the other SCSI drives if the failed SCSI drive 12 stores data other than the duplicated data, or this data can be rebuilt by using one of the new duplicated data in the SCSI drive if the failed SCSI drive stores the other of the new duplicated data.