1. Field of the Invention
This invention relates to disk array devices and more specifically, to a disk array device in which multiple disks (typically, magnetic disks or optical disks) construct a disk array capable of storing a large volume of data, transferring data at high speed, and further providing higher system reliability.
2. Description of the Background Art
Typical disk array devices include a RAID (Redundant Array of Inexpensive Disks). The RAID is discussed in detail in xe2x80x9cA Case for Redundant Arrays of Inexpensive Disksxe2x80x9d, by David A. Patterson, Garth Gibson, Randy H. Katz, University of California Berkeley, December 1987, and others. Six basic architectures of the RAID from levels 0 to 5 have been defined. Described below is how a RAID adopting the level 3 architecture (hereinafter referred to as RAID-3) controls input/output of data. FIG. 69 is a block diagram showing the typical structure of the RAID-3. In FIG. 69, the RAID includes a controller 6901, and five disk drives 6902A, 6902B, 6902C, 6902D, and 6902P. A host device is connected to the controller 6901, making a read/write request of data to the RAID. When receiving data to be written, the controller 6901 divides the data into data blocks. The controller 6901 generates redundant data using these data blocks. After creation of the redundant data, each data block is written into the disk drives 6902A to 6903D. The redundant data is written into the disk drive 6902P.
Described next is the procedure of creating redundant data with reference to FIGS. 70a and 70b. Data to be written arrives at the controller 6901 by a unit of a predetermined size (2048 bytes, in this description). Here, as shown in FIG. 70a, currently-arrived data is called D-1. The data D-1 is divided into four by the controller 6901, and thereby four data blocks D-A1, D-B1, D-C1, and D-D1 are created. Each data block has a data length of 512 bytes.
The controller 6901 then creates redundant data D-P1 using the data blocks D-A1, D-B1, D-C1, and D-D1 by executing a calculation given by:
D-P1i=D-A1i xor D-B1i xor D-C1i xor D-D1ixe2x80x83xe2x80x83(1)
Here, since each of the data blocks D-A1, D-B1, D-C1, D-D1, and D-P1 has a data length of 512 bytes, i takes on natural numbers from 1 to 512. For example, when i=1, the controller 6901 calculates the redundant data D-P11 using each first byte (D-A11, D-B11, D-C11, and D-D11) of the data blocks D-A1, D-B1, D-C1, and D-D1. Here, D-P11 is a first byte of the redundant data. When i=2, the controller 6901 calculates the redundant data D-P12 using each second byte (D-A12, D-B12, D-C12, and D-D12) of the data blocks D-A1, D-B1, D-C1, and D-D1. Thereafter, the controller 6901 repeats the calculation given by the equation (1) until the last byte (512nd byte) of the data blocks D-A1, D-B1, D-C1, and D-D1 to calculate redundant data D-P11, D-P12, . . . D-P1512. The controller 6901 sequentially arranges the calculated redundant data D-P11, D-P12, . . . D-P1512 to generate the redundant data D-P1. As clear from the above, the redundant data D-P1 is parity of the data blocks D-A1, D-B1, D-C1, and D-D1.
The controller 6901 stores the created data blocks D-A1, D-B1, D-C1, and D-D1 in the disk drives 6902A, 6902B, 6902C, and 6902D, respectively. The controller 6901 also stores the generated redundant data D-P1 in the disk drive 6902P. The controller 6901 stores the data blocks D-A1, D-B1, D-C1, D-D1, and D-P1 in the disk drives 6902A, 6902B, 6902C, 6902D and 6902P, respectively, as shown in FIG. 70b. 
The controller 6901 further controls reading of data. Here, assume that the controller 6901 is requested to read the data D-1 by the host device. In this case, when each of the disk drives 6902A, 6902B, 6902C, and 6902D operates normally, the controller 6901 reads the data blocks D-A1, D-B1, D-C1, and D-D1 from the disk drives 6902A, 6902B, 6902C, and 6902D, respectively. The controller 6901 assembles the read data blocks D-A1, D-B1, D-C1, and D-D1 to compose the data D-1 of 2048 bytes. The controller 6901 transmits the composed data D-1 to the host device.
There is a possibility that a failure or fault may occur in any disk drives. Here, assuming that the disk drive 6902C has failed and the host device has sent a read request for the data D-1. In this case, the controller 6901 first tries to read the data blocks D-A1, D-B1, D-C1, and D-D1 from the disk drives 6902A, 6902B, 6902C, and 6902D, respectively. However, since the disk drive 6902C is eventually failed, the data block D-C1 is not read therefrom. Assume herein, however, that the data blocks D-A1, D-B1, and D-D1 are read from the disk drives 6902A, 6902B, and 6902D normally. When recognizing that the data block D-C1 cannot be read, the controller 6901 reads the redundant data D-P1 from the disk drive 6902P.
The controller 6901 then recovers the data block D-C1 by executing a calculation given by the following equation (2) using the data blocks D-A1, D-B1, and D-D1 and the redundant data D-P1.
D-C1i=D-A1i xor D-B1i xor D-D1i xor D-P1ixe2x80x83xe2x80x83(2)
Here, since each of the data blocks D-A1, D-B1, and D-D1, and the redundant data D-P1 has a data length of 512 bytes, i takes on natural numbers from 1 to 512. The controller 6901 calculates the redundant data D-C11, D-C12, . . . D-C1512 by repeatedly executing the calculation given by the equation (2) from the first byte to 512nd byte. The controller 6901 recovers the data block D-C1 based on these calculation results. Therefore, all of the data blocks D-A1 to D-D1 are stored in the controller 6901. The controller 6901 assembles the stored data blocks D-A1 to D-D1 to compose the data D-1 of 2048 bytes. The controller 6901 transmits the composed data D-1 to the host device.
As described above, there is a possibility that the RAID in FIG. 69 cannot read the requested data block from a faulty disk drive (any one of the disk drives 6902A to 6902D). The RAID, however, operates calculation of parity given by the equation (2) using the data blocks read from the other four normal disk drives and the redundant data. The calculation of parity allows the RAID to recover the data block stored in the faulty disk drive.
In recent years, the RAID architecture, as an example of a disk array, is often implemented also in video servers which provide video upon a user""s request. In video servers, data to be stored in the disk drives 6902A to 6902D of the RAID includes two types: video data and computer data (typically, video title and total playing time). Since video data and computer data have different characteristics, requirements of the RAID system are different in reading video data and computer data.
More specifically, computer data is required to be reliably transmitted to the host device. That is, when a data block of computer data cannot be read, the RAID has to recover the data block by operating calculation of parity. For this purpose, the RAID may take some time to transmit the computer data to the host device. On the other hand, video data is replayed as video at the host device. When part of video data arrives late at the host device, the video being replayed at the host device is interrupted. More specifically, video data in general is far larger in size than 2048 bytes, which are read at one time. The video data is composed of several numbers of data of 2048 bytes. Therefore, when requesting the video data to be replayed, the host device has to make a read request of data of 2048 bytes several times. On the other hand, the RAID has to read the video data from the disk drives 6902A to 6902D within a predetermined time from the arrival of each read request. If reading of the data of 2048 bytes is delayed even once, the video being replayed at the host device is interrupted. Therefore, the RAID is required to sequentially transmit the data of 2048 bytes composing the video data to the host device. Described below are RAID systems disclosed in Japanese Patent Laying-Open No. 2-81123 and No. 9-69027, which satisfy such requirements.
A first RAID disclosed in Japanese Patent Laying-Open No. 2-81123 is now described. The first RAID includes a disk drive group composed of a plurality of disk drives. The disk drive group includes a plurality of disk drives for storing data (hereinafter referred to as data-drives) and a disk drive for storing redundant data created from the data (hereinafter referred to as parity-drive). When reading data from the plurality of data-drives, the first RAID checks whether reading from one of the data-drives is delayed for more than a predetermined time after the reading from the other data-drives starts. The first RAID determines that the data-drive in which reading is delayed for more than the predetermined time is a faulty drive. After detecting the faulty drive, the first RAID recovers the data to be read from the faulty drive, using data in the other data-drives and redundant data in the parity-drive.
As shown in FIG. 71a, the first RAID determines that the data-drive D has failed when the data-drive D does not start reading after the lapse of the predetermined time from the start of a fourth reading (data-drive B). To recover the data block of the data-drive D, the first RAID operates calculation of parity. In general disk drives, however, the time from a start to an end of reading is not constant. Some disks may complete reading in a short period of time, while others may take a long time to complete reading after several failures. Therefore, in the first RAID, as shown in FIG. 71b, even though the parity-drive P starts reading earlier than the data-drive B which starts reading fourth, the data-drive B may complete its reading earlier than the parity-drive P. In this case, even after the lapse of the predetermined time after the data-drive B starts reading, the redundant data has not been read from the parity-drive P. Therefore, the first RAID cannot recover the data-block of the data-drive D. As a result, transmission of the data composing the video data being read is delayed, and the video being replayed at the host device might be interrupted.
A second RAID disclosed in Japanese Patent Laying-Open No. 9-69027 is now described. The second RAID also includes a plurality of data-drives for storing data, and a parity-drive for storing redundant data created from the data. The second RAID does not read the redundant data from the parity-drive under normal conditions. That is, when a read request arrives, the second RAID tries to read the data blocks from the plurality of data-drives. The second RAID previously stores time (hereinafter referred to as predetermined time) by which the plurality of data-drives have to have completed reading. In some cases, the second RAID detects the data-drive which has not completed reading after the lapse of the predetermined time from the time of transmission of a read request to each data-drive. In this case, the second RAID reads the redundant data from the parity-drive to recover the data block which has not yet been completely read.
However, the redundant data is started to be read after the lapse of the predetermined time (after timeout) from the time of transmission of the read request for the data block. Therefore, as shown in FIG. 72a, it disadvantageously takes much time to recover the unread data block. Furthermore, in some cases, the second RAID successfully reads a data block immediately after timeout as shown in FIG. 72b. In this case, the second RAID may transmit the data faster with the data block read immediately after the timeout. Once the redundant data is started to be read, however, the second RAID does not use the data block read immediately after the timeout, and as a result, data transmission to the host device may be delayed. This delay may cause interruption of video being replayed at the host device.
In most cases, in the disk drive where reading of the data block is delayed, read requests subsequent to the read request currently being processed wait for a read operation. Therefore, when the disk drive fails to read the data block and retries reading of the data block, processing of the subsequent read requests is delayed. As evident from above, in the conventional disk array device including the above first and second RAID, a read failure may affect subsequent reading.
Referring back to FIG. 69, the controller 6901 stores the four data blocks D-A1 to D-D1 and the redundant data D-P1 in the disk drives 6902A to 6902D and 6902P, respectively. The four data blocks D-A1 to D-D1 and the redundant data D-P1 are generated from the same data D-1 of 2048 bytes. Thus, a set of data blocks and redundant data generated based on the same data received from a host device is herein called a parity group. Also, a set of a plurality of disk drives in which data blocks and redundant data of the same parity group are written is herein called a disk group.
In the disk array device such as RAID, a failure may occur in any disk drive therein. The disk array device, however can recover the data block of the faulty disk drive by operating calculation of parity using the other data blocks and the redundant data of the same parity group. In the above description, the disk array device assembles data to be transmitted to the host device using the recovered data block. If the faulty disk drive is left as it is, calculation of parity is executed whenever the data block is tried to be read from the faulty disk drive, which takes much time. As a result, data transmission to the host device is delayed, and video being replayed at the host device is interrupted. Therefore, some disk array devices executes reconstruction processing. In the reconstruction processing, the data block or the redundant data in the faulty disk drive is recovered, and the recovered data block or redundant data is rewritten in another disk drive or a normal area in the faulty disk drive.
However, when another failure occurs in another disk drive of the same parity group while the defective disk drive is left as it is, reconstruction cannot be executed. Therefore, reconstruction is required to be executed as early as possible. An example of such reconstruction is disclosed in Japanese Patent Laying-Open No. 5-127839. A disk array device disclosed in this publication (hereinafter referred to as first disk array device) includes a disk array composing a plurality of disk drives, and a disk controller for controlling the disk array. The disk controller monitors states of operation of the disk array. When reconstruction is required, the disk controller selects and executes one of three types of reconstruction methods according to the state of operation of the disk array. In one method, reconstruction occurs during idle time of the array. In a second method reconstruction is interleaved between current data area accessing operations of the array at a rate which is inversely proportional to an activity level of the array. In a third method, the data are reconstructed when a data area being accessed is a data area needing reconstruction.
As described above, in some cases, both computer data and video data are written in each disk drive of the disk array device. Therefore, both read requests for reading the computer data and those for reading the video data arrive at the disk array device from the host device. When a large number of read requests for the computer data arrive, the disk array device has to execute reading of the computer data repeatedly, and as a result, reading of the video data may be delayed. This delay may cause interruption of the video being replayed at the host device.
The first disk array device executes reconstruction on the faulty disk drive while processing read requests being transmitted from the host device. Such reconstruction is executed on the entire disk drives of the same disk group with one operation. That is, reconstruction cannot be executed unless the entire disk drives of the same disk group are in an idle state.
In RAID-4 or RAID-5, each disk drive operates independently, and therefore if any one of the disk drives is in an idle state, the other disk drives of the same disk group may be under load conditions. As a result, the first disk array device cannot take sufficient time to execute reconstruction, and thus efficient reconstruction cannot be made.
Further, the conventional disk array device may execute reassignment. The structure of a disk array device of executing reassigning is similar to that shown in FIG. 69. Reassignment processing is now described in detail. Each disk drive composing a disk array has recording areas in which a defect may occur due to various reasons. Since the disk drive cannot read/write a data block or redundant data from/in a defective area, an alternate recording area is reassigned to the defective recording area. In the alternate recording area, the data block or redundant data stored in the defective recording area or to be written in the defective area is stored. Two types of such reassignment have been known.
One reassignment is so-called auto-reassign executed by each disk drive composing the disk array. Each disk drive previously reserves part of its recording areas as alternate areas. When the data block or redundant data cannot be read/written from/in the recording area specified by the controller, the disk drive assumes that the specified area is defective. When detecting the defective area, the disk drive selects one of the reserved alternate areas, and assigns the selected alternate area to the detected defective area.
The other reassignment is executed by the controller. The controller previously reserves part of its recording areas as alternate areas, and manages information for specifying the alternate areas. When the disk drive cannot access the recording area specified by the controller, the disk drive notifies the controller that the recording area is defective. When receiving the notification of the defective area, the controller selects one of the alternate areas from the managed information, and reassigns the selected alternate area to the defective area.
In some recording areas, reading or writing may be eventually successful if the disk drive repeats access to these recording areas (that is, if the disk drive takes much time to access thereto). In the above two types of reassignment, however, the alternate area cannot be assigned to the recording area to which the disk drive takes much time to access because reading/writing will eventually succeed even though much time is required. When the data block composing the video data is stored in such a recording area, however, it takes much time to read the data block. As a result, video being replayed at the host device may be interrupted.
Therefore, an object of the present invention is to provide a disk array device capable of reading data (data block or redundant data) from a disk array to transmit the same to a host device and writing data from the host device in the disk array in a short period of time.
The present invention has the following features to solve the problem above.
A first aspect of the present invention is directed to a disk array device executing read operation for reading data recorded therein in response to a first read request transmitted thereto, the disk array device with data blocks generated by dividing the data and redundant data generated from the data blocks recorded therein, comprising:
m disk drives across which the data blocks and the redundant data are distributed; and
a control part controlling the read operation;
the control part
issuing second read requests to read the data blocks and the redundant data from the m disk drives in response to the first read request sent thereto;
detecting the disk drive reading from which of the data block or the redundant data is no longer necessary from among the m disk drives; and
issuing a read termination command to terminate the detected disk drive.
As described above, in the first aspect, when it is determined that reading of one of the data blocks or the redundant data is not necessary, this reading is terminated. Therefore, the disk drive which terminated this reading can advance the next reading. Thus, it is possible to provide the disk array device in which, if reading of one disk drive is delayed, this delay does not affect other reading.
According to a second aspect, in the first aspect, when (mxe2x88x921) of the disk drives complete reading, the control part:
determines that reading being executed in one remaining disk drive is no longer necessary; and
issues a read termination command to the remaining disk drive.
As described above, in the second aspect, also when reading of one disk drive takes too much time, this reading is terminated. Therefore, it is possible to provide the disk array device in which, if reading of one disk drive is delayed, this delay does not affect other reading.
According to a third aspect, in the first aspect, when detecting that two or more of the disk drives cannot complete reading, the control part:
determines that reading being executed in other disk drives is no longer necessary; and
issues a read termination command to the determined disk drive.
In the third aspect, when calculation of parity cannot be executed, reading presently being executed can be terminated. Therefore, since unnecessary reading is not continued, it is possible to provide the disk array device in which unnecessary reading does not affect other reading.
According to a fourth aspect, in the first aspect, when the (mxe2x88x921) the disk drives complete reading, the control part:
determines that reading not yet being executed in one remaining disk drive is no longer necessary; and
issues a read termination command to the remaining disk drive.
In the fourth aspect, unnecessary reading is not continued, it is possible to provide the disk array device in which unnecessary reading does not affect other reading.
A fifth aspect of the present invention is directed to a disk array device executing read operation for reading data recorded therein in response to a first read request from a host device, the disk array device with data blocks generated by dividing the data and redundant data generated from the data blocks recorded therein, comprising:
m disk drives across which the data blocks and the redundant data are distributed;
a parity calculation part operating calculation of parity from (mxe2x88x922) of the data blocks and the redundant data to recover one remaining data block; and
a control part controlling the read operation;
the control part:
issuing second read requests to read the data blocks and the redundant data from the m disk drives in response to the first read request sent thereto;
when (mxe2x88x921) of the disk drives complete reading, detecting whether a set of the data blocks and the redundant data has been read from the (mxe2x88x921) disk drives;
when detecting that the set of the data blocks and the redundant data has been read, issuing a recovery instruction to the parity calculation part to recover the data block not read from the one remaining disk drive after waiting for a predetermined time period from a time of detection; and
when the one remaining data block is recovered by the calculation of parity in the parity calculation part, executing operation for transmitting the data to the host device, wherein the predetermined time period is selected so as to ensure data transmission to the host device without delay.
In the fifth aspect, after a set of the data blocks and redundant data is read from (mxe2x88x921) disk drives, the controller waits for a predetermined time until the remaining one data block is read. If the remaining one data block has been read by the predetermined time, calculation of parity is not required. Thus, it is possible to reduce the number of operation of calculation of parity.
According to a sixth aspect, in the fifth aspect, when detecting that the set of the data blocks and the redundant data has not been read, the control part transmits the data to the host device without waiting for the predetermined time period from the a time of detecting.
In the sixth aspect, if only the data blocks are read from the (mxe2x88x921) disk drives, the controller does not wait for a predetermined but transmits the data to the host device. Therefore, it is possible to achieve the disk array device capable of reading a larger volume of data per unit of time.
According to a seventh aspect, in the fifth aspect, the predetermined time period is selected based on a start of reading in each of the disk drives and a probability of completing the reading.
In the seventh aspect, in most cases, the remaining one data block is read. Therefore, it is possible to reduce the number of operation of calculation of parity.
An eighth aspect of the present invention is directed to a disk array device executing read operation for reading data recorded therein in response to a first read request from a host device, the disk array device with data blocks generated by dividing the data and redundant data generated from the data blocks recorded therein, comprising:
m disk drives across which the data blocks and the redundant data are distributed;
a parity calculation part operating calculation of parity from (mxe2x88x922) of the data blocks and the redundant data to recover one remaining data block; and
a control part controlling the read operation; the control part:
issuing second read requests to read the data blocks and the redundant data from the m disk drives in response to the first read request sent thereto;
when (mxe2x88x921) of the disk drives complete reading, detecting whether a set of the data blocks and the redundant data has been read from the (mxe2x88x921) disk drives;
when detecting that the set of the data blocks and the redundant data has been read, issuing a recovery instruction to the parity calculation part to recover the data block not read from the one remaining disk drive after waiting for a predetermined time period from a time of detection; and
when the one remaining block is recovered by the calculation of parity in the parity calculation part, executing operation for transmitting the data to the host device, wherein the recovery instruction is issued while the parity calculation part is not operating calculation of parity.
In the eighth aspect, the controller reliably issues a recovery instruction only when calculation of parity is not executed. This prevents a needless load on the parity calculator, achieving effective use of the parity calculator.
According to a ninth aspect, in the eighth aspect, the disk array device further comprises:
a table including a time period during which the parity calculation part can operate calculation of parity, wherein the control part further issues the recovery instruction when the parity calculation part does not operate calculation of parity by referring to the time period included in the table.
A tenth aspect of the present invention is directed to a disk array device executing read operation for reading data recorded therein in response to a first read request from a host device, the disk array device with data blocks generated by dividing the data and redundant data generated from the data blocks recorded therein, comprising:
m disk drives across which the data blocks and the redundant data are distributed;
a parity calculation part operating calculation of parity from (mxe2x88x922) of the data blocks and the redundant data to recover one remaining data block; and
a control part controlling the read operation, the control part:
in response to the first read request received thereto, determining whether (mxe2x88x921) of the disk drives have previously failed to read each data block or not;
when determining that the (mxe2x88x921) disk drives have not previously failed to read each of the data block, issuing second read requests to the (mxe2x88x921) disk drives to read only each the data blocks; and
the when the data blocks are read from the (mxe2x88x921) disk drives, executing operation for transmitting the data to the host device.
In the tenth aspect, in some cases, a second read request may not be issued for the redundant data. That is, when the redundant data is not required, such unnecessary redundant data is not read. As a result, it is possible to increase a volume of data which can be read per unit of time.
According to an eleventh aspect, in the tenth aspect, the control part:
when determining that the (mxe2x88x921) disk drives have previously failed to read each the data block, issues second read requests to the m disk drives to read (mxe2x88x921) of the data blocks and the redundant data;
when the (mxe2x88x921) disk drives complete reading, detects whether a set of the data blocks and the redundant data has been read from the (mxe2x88x921) disk drives or not;
when detecting that the set of the data blocks and the redundant data has been read, issues a recovery instruction to the parity calculation part to recover the data block not read from one remaining disk drive; and
when the one remaining data block is recovered by the calculation of parity in the parity calculation part, executes operation for executing operation for transmitting the data to the host device.
In the eleventh aspect, a second read request is issued for reading the redundant data when required. Therefore, it is possible to immediately operate calculation of parity.
According to a twelfth aspect, in the eleventh aspect, the disk array device further comprises:
a table registering therein recording areas of the data blocks which have previously been failed to be read by the disk drives, wherein the control part determines whether to issue the second read requests to the (mxe2x88x921) disk drives or to the m disk drives.
According to a thirteenth aspect, in the twelfth aspect, the disk array device further comprises:
a reassignment part, when a defect occurs in a recording area of the data block or redundant data in the m disk drives, executing reassign processing for assigning an alternate recording area to the defective recording area, wherein when the reassignment part assigns the alternate recording area to the defective recording area of the data block registered in the table by the reassignment part, the control part deletes the defective recording area of the data block from the table.
In the thirteenth aspect, an alternate recording area is assigned to the defective recording area, and the data block or redundant data is rewritten in this alternate area. Therefore, in the table, the number of data blocks which require long time in read operation can be reduced. Therefore, it is possible to provide the disk array device capable of reading a larger volume of data per unit of time.
According to a fourteenth aspect, in the thirteenth aspect, the disk array device further comprises:
a first table storage part storing a first table in which an address of the alternate recording area previously reserved in each of the m disk drives can be registered as alternate recording area information; and
a second table storage part storing a second table in which address information of the alternate recording area assigned to the defective recording area can be registered, wherein the reassignment part:
when the second read requests are transmitted from the control part to the m disk drives, measures a delay time in each of the disk drives;
determines whether each of the recording area of the data blocks or the redundant data to be read by each second read request is defective or not based on the measured delay time;
when determining that the recording area is defective, assigns the alternate recording area to the defective recording area based on the alternate recording area information registered in the first table of the first table storage part; and
registers the address information of the assigned alternate recording area in the second table of the second table storage part;
the control part issues the second read requests based on the address information registered in the second table of the second table storage part; and
the delay time is a time period calculated from a predetermined process start time.
In the fourteenth aspect, the reassignment part determines whether the recording area is defective or not based on an elapsed time calculated from a predetermined process start time. When a delay in the response returned from the disk drive is large, the reassignment part determines that the recording area being accessed for reading is defective, assigning an alternate recording area. This allows the disk array device to read and transmit the data to the host device, while suppressing occurrence of a delay in response.
According to a fifteenth aspect, in the first aspect, the disk array device further comprises:
a reassignment part, when a defect occurs in a recording area of the data block or redundant data in the m disk drives, executing reassign processing for assigning an alternate recording area to the defective recording area.
According to a sixteenth aspect, in the fifteenth aspect, the disk array device further comprises:
a first table storage part storing a first table in which an address of the alternate recording area previously reserved in each of the m disk drives can be registered as alternate recording area information; and
a second table storage part storing a second table in which address information of the alternate recording area assigned to the defective recording area can be registered, wherein the reassignment part:
when the second read requests are transmitted from the control part to the m disk drives, measures a delay time in each of the disk drives;
determines whether each of the recording areas of the data blocks or the redundant data to be read by each second read request is defective or not based on the measured delay time;
when determining that the recording area is defective, assigns the alternate recording area to the defective recording area based on the alternate recording area information registered in the first table of the first table storage part; and
registers the address information of the assigned alternate recording area in the second table of the second table storage part;
the control part issues the second read requests based on the address information registered in the second table of the second table storage part; and
the delay time is a time period calculated from a predetermined process start time.
According to a seventeenth aspect, in the sixteenth aspect, the reassignment part assigns the alternate recording area to the defective recording area only when determining successively a predetermined number of times that the recording area is defective.
In the seventeenth aspect, when determining successively determines for a predetermined number of times that the recording area may possibly be defective, the reassignment part assigns an alternate recording area to that recording area. Therefore, if the reassignment part sporadically and wrongly determines that the recording area is defective, the alternate recording area is not assigned to that recording area. Therefore, it is possible to provide the disk array device which assigns an alternate recording area only to a truly defective area.
According to an eighteenth aspect, in the sixteenth aspect, the predetermined process start time is a time when each of the second read requests is transmitted to each of the m disk drives.
According to a nineteenth aspect, in the sixteenth aspect, the predetermined process start time is a time when the m disk drives start reading based on the second read requests.
A twentieth aspect of the present invention is directed to a data input/output method used for a disk array device comprising a disk array constructed of recording mediums for recording redundant data and an array controller for controlling the disk array according to an access request transmitted from a host device, the method comprising:
generating, by the array controller, a read or write request to the disk array with predetermined priority based on the received access request;
enqueuing, by the array controller, the generated read or write request to a queue included therein according to the predetermined priority;
selecting, by the array controller, the read or write request to be processed by the disk array from among the read or write requests enqueued to the queue according to the predetermined priority; and
processing, by the disk array, the selected read or write request.
In the twentieth aspect, the array controller converts the received access request to a read or write request with predetermined priority. The disk array processes the read or write request selected by the array controller according to priority. Therefore, in the disk array device including the disk array in which redundant data is recorded, it is possible to generate a read or write request with relatively high priority for the access request required to be processed in real time, while a read or write request with relatively low priority for the access request not required to be processed in real time. Thus, the disk array device can distinguish the access request from the host device according to the requirement of real-time processing. Consequently, the access request required to be processed in real time is processed in the disk array device without being affected by the access request not required to be processed in real time.
According to a twenty-first aspect, in the twentieth aspect, the array controller includes queues therein corresponding to the priority; and
the generated read request or write request is enqueued to the queue corresponding to the predetermined priority.
In the twenty-first aspect, since the queue is provided for each level of priority, it is possible to distinguish the access request from the host device according to the requirement of real-time processing, and various processing in the disk array device is effectively processed.
According to a twenty-second aspect, in the twentieth aspect, the array controller includes queues therein corresponding to the predetermined priority for each of the recording mediums, the array controller generates the read or write request with the predetermined priority for each of the recording mediums 25 based on the received access request; and
the array controller enqueues the read or write request generated for each of the recording mediums to the queue in the corresponding recording medium according to the predetermined priority.
In the twenty-second aspect, since the queue is provided for each recording medium and each level of priority, it is possible to distinguish the access request from the host device for each recording medium according to the requirement of real-time processing, and various processing in the disk array device is further effectively processed.
According to a twenty-third aspect, in the twentieth aspect, the predetermined priority is set based on whether processing in the disk array is executed in real time or not.
In the twenty-third aspect, the predetermined priority is set based on the requirement of real-time processing. Consequently, the access request required to be processed in real time is processed in the disk array device without being affected by the access request not required to be processed in real time.
According to a twenty-fourth aspect, in the twentieth aspect, when an I/O interface is between the information recording device and the host device conforms to SCSI, the predetermined priority is previously set in a LUN or LBA field of the access request.
In the twenty-fourth aspect, the predetermined priority is previously set in the access request. Therefore, the host device can notify the disk array device of the level of priority of the read or write request, that is, with how much priority the read or write request is required to be processed.
A twenty-fifth aspect of the present invention is directed to a disk array device including a disk array constructed of recording mediums for recording redundant data and controlling the disk array according to an access request transmitted from a host device, comprising:
a control part generating a read or write request to the disk array with predetermined priority based on the received access request;
a queue managing part enqueuing the read request or write request generated by the control part to a queue included therein according to the predetermined priority; and
a selection part selecting the read or write request to be processed by the disk array from among the read or write requests enqueued to the queue, wherein the disk array processes the read request or write request selected by the selection part.
In the twenty-fifth aspect, the received access request is converted into a read or write request with predetermined priority. The disk array processes the read or write request selected by the selection part according to the level of priority. Therefore, in the disk array device including the disk array in which redundant data is recorded, it is possible to generate a read or write request with relatively high priority for the access request required to be processed in real time, while a read or write request with relatively low priority for the access request not required to be processed in real time. Thus, the disk array device can distinguish the access request from the host device according to the requirement of real-time processing. Consequently, the access request required to be processed in real time is processed in the disk array device without being affected by the access request not required to be processed in real time.
According to a twenty-sixth aspect, in the twenty-fifth aspect, the queue managing part includes queues therein corresponding to the priority, and the read or write request generated by the control part is enqueued to the queue corresponding to the predetermined priority.
In the twenty-sixth aspect, since the queue is provided for each level of priority, it is possible to distinguish the access request from the host device according to the requirement of real-time processing, and various processing in the disk array device is effectively processed.
According to a twenty-seventh aspect, in the twenty-fifth aspect, the queue managing part includes queues therein corresponding to the predetermined priority for each of the recording mediums;
the queue managing part generates the read or write request with the predetermined priority for each of the recording mediums based on the received access request; and
the queue managing part enqueues the read or write request generated for each of the recording mediums to the queue in the corresponding recording medium according to the predetermined priority.
In the twenty-seventh aspect, since the queue is provided for each recording medium and each level of priority, it is possible to distinguish the access request from the host device for each recording medium according to the requirement of real-time processing, and various processing in the disk array device is further effectively processed.
A twenty-eighth aspect of the present invention is directed to, in an information recording device comprising a disk array constructed of recording mediums for recording redundant data and an array controller for controlling the disk array according to an access request transmitted from a host device, a data reconstruction method for recovering data recorded on a faulty recording medium in the disk array and reconstructing the data, the method comprising:
generating, by the array controller, a read or write request required for data reconstruction to the disk array with predetermined priority;
enqueuing, by the array controller, the generated read or write request to a queue included therein according to the predetermined priority;
selecting, by the array controller, the read or write request to be processed from among the read or write requests enqueued to the queue according to the predetermined priority;
processing, by the disk array, the selected read or write request; and
executing, by the array controller, data reconstruction based on processing results of the read or write request by the disk array.
In the twenty-eighth aspect, the array controller generates a read or write request for data reconstruction. The generated read or write request has predetermined priority. The disk array processes the read or write request selected by the array controller according to the level of priority. Therefore, when the disk array device which executes reconstruction processing provides relatively low priority for the read or write request for data reconstruction, the read or write request is processed without affecting other real-time processing. On the other hand, when the disk array device provides relatively high priority, the read or write request is processed with priority, ensuring the end time of data reconstruction.
According to a twenty-ninth aspect, in the twenty-eighth aspect, the array controller includes queues therein corresponding to the predetermined priority for each of the recording mediums;
the array controller generates the read or write request required for data reconstruction with the predetermined priority for each recording medium; and
the array controller enqueues the generated read or write request to the queue in the corresponding recording medium according to the predetermined priority.
In the twenty-ninth aspect, since the queue is provided for each recording medium and each level of priority, and further, since the array controller generates a read or write request with predetermined priority for each recording medium, it is possible to distinguish the access request from the host device for each recording medium according to the requirement of real-time processing, and various processing in the disk array device is further effectively processed.
According to a thirtieth aspect, in the twenty-eighth aspect, the read and write requests generated by the array controller are given lower priority to be processed in the disk array.
In the thirtieth aspect, since having relative lower priority, the read or write request is processed without affecting other real-time processing.
According to a thirty-first aspect, in the twenty-eighth aspect, the read and write requests generated by the array controller are given higher priority to be processed in the disk array.
In the thirty-first aspect, since having relatively higher priority, the read or write request is processed with priority, ensuring the end time of data reconstruction.
A thirty-second aspect of the present invention is directed to a data input/output method used in an information recording device comprising a disk array constructed of recording mediums for recording redundant data and an array controller for controlling the disk array according to an access request transmitted from a host device, recovering the data recorded on the recording medium which has a failure in the disk array, and reconstructing the data in a spare recording medium;
when the access request for data to be reconstructed in the spare recording medium is transmitted from the host device to the information storage device, the method comprises:
the array controller;
reading data for recovery required for recovering the data recorded in the failed recording medium from the disk array;
recovering data recorded in the failed recording medium by executing a predetermined calculation with the data for recover read from the disk array;
generating a write request with predetermined priority to write the recovered data in the spare recording medium;
enqueuing the generated write request to a queue therein according to the predetermined priority; and
selecting the generated write request as the write request to be processed by the disk array according to the predetermined priority; and
the disk array:
processing the write request selected by the array controller, and writing the recovered data in the spare recording medium, wherein the write request is given relatively lower priority.
In the thirty-second aspect, when the host device transmits an access request for data to be reconstructed in the spare recording medium, the array controller recovers the data to write in the spare recording medium. Therefore, next time the disk array device executes data reconstruction, it is not required to recover the data requested to be accessed. The time required for data reconstruction is thus shortened.
A thirty-third aspect of the present invention is directed to a disk array device which reassigns an alternate recording area to a defective recording area of data, comprising:
a read/write control part for specifying a recording area of data, and producing an I/O request to request read or write operation;
a disk drive, when receiving the I/O request transmitted from the read/write control part, accessing to the recording area specified by the I/O request to read or write the data; and
a reassignment part when receiving the I/O request transmitted from the read/write control part, calculating an elapsed time from a predetermined process start time as a delay time and determining whether the recording area specified by the I/O request is defective or not based on the delay time, wherein when determining that the recording area of the data is defective, the reassignment part instructs the disk drive to assign the alternate recording area to the defective recording area.
In the thirty-third aspect, the reassignment part determines whether the recording area of the data specified by the received I/O request is defective or not based on a delay time calculated from a predetermined process start time. The reassignment part can determine the length of a delay in response from the disk drive based on the delay time. When determining that the recording area is defective, the reassignment part instructs the disk drive to assign an alternate recording area. That is, when the process time for one recording area in the disk drive is long, the reassignment part determines that that recording area is defective, instructing the disk drive to perform reassign processing. The disk array device thus suppress occurrence of a long delay in response, allowing data input/out in real time.
According to a thirty-fourth aspect, in the thirty-third aspect, the reassignment part assigns the alternate recording area to the defective recording area only when determining successively a predetermined number of times that the recording area is defective.
In the thirty-fourth aspect, when the reassignment part determines successively for a predetermined number of times that one recording area is defective, an alternate recording area is assigned to that recording area. Therefore, the reassignment part can suppress a sporadic determination error due to thermal aspiration in the disk drive and the like. Therefore, the reassignment part can instruct the disk drive to assign an alternate recording area only to a truly defective area.
According to a thirty-fifth aspect, in the thirty-third aspect, the predetermined process start time is a time when the I/O request is transmitted from the read/write control part.
According to a thirty-sixth aspect, in the thirty-third aspect, the predetermined process start time is a time when the I/O request transmitted from the read/write control part is started to be processed in the disk drive.
In the thirty-fifth or thirty-sixth aspect, the predetermined process time is the time when the I/O request is transmitted to the disk drive or the time when the I/O request is started to be processed. Therefore, the reassignment part can recognize the delay time correctly.
According to a thirty-seventh aspect, in the thirty-third aspect, the reassignment part further instructs the disk drive to terminate the read or write operation requested by the I/O request when the recording area of the data is defective.
In the thirty-seventh aspect, the reassignment part instructs the disk drive to terminate processing of the I/O request specifying the recording area which is now determined to be defective. When the reassignment part determines that the recording area is defective, the disk drive can terminate processing the I/O request for that defective area, suppressing occurrence of an additional delay in response.
A thirty-eighth aspect of the present invention is directed to a disk array device which reassigns an alternate recording area to a defective recording area of data, comprising:
a read/write control part specifying a recording area of the data, and producing an I/O request to request read or write operation;
a disk drive, when receiving the I/O request from the read/write control part, accessing to the recording area specified by the I/O request to read or write the data; and
a reassignment part, when the recording area specified by the I/O request from the read/write control part is defective, instructing the disk drive to reassign the alternate recording area to the defective recording area, wherein when instructed to reassign by the reassignment part, the disk drive assigns a recording area in which time required for the read or write operation is within a predetermined range, as the alternate recording area.
In the thirty-eighth aspect, the disk drive takes the recording area in which the time required for read or write operation is within a predetermined range as the alternate recording area. Therefore, the disk array device can suppress occurrence of a large delay in response, allowing input/output of data in real time.
According to a thirty-ninth aspect, in the thirty-eighth aspect, the predetermined range is selected based on overhead in the disk array device.
In the thirty-ninth aspect, the predetermined range is easily selected based on overhead, which is a known parameter. Therefore, the design of the disk array device can be more simplified.
According to a fortieth aspect, in the thirty-eighth aspect, when part or all of the recording areas of the data are defective, the reassignment part assumes that the whole recording areas are defective.
In the fortieth aspect, in the disk array device, the alternate recording area is assigned not by a fixed-block unit, which is a managing unit in the disk drive. Therefore, the disk array device can prevent data fragmentation, suppressing occurrence of a large delay in response more.
According to a forty-first aspect, in the thirty-eighth aspect, the reassignment part transmits a reassign block specifying a logical address block of the defective recording area to the disk drive for reassignment; and
the disk drive assigns a physical address with which the time required for read or write operation is within the predetermined range to a logical address specified by the reassign block transmitted from the reassignment part as the alternate recording area.
In the forty-first aspect, the disk drive assigns a physical address in which the time required for read or write operation is within a predetermined range as the alternate recording area to the physical address on which reassign processing is to be performed. Therefore, the disk array device can suppress occurrence of a large delay in response, allowing input/output of data in real time.
According to a forty-second aspect, in the thirty-eighth aspect, when the read/write control part requests the disk drive to read the data, and the recording area of the data is defective, the data recorded in the defective recording area is recovered based on predetermined parity and other data; and
the read/write control part specifies the assigned alternate recording area, and requests the disk drive to write the recovered data.
According to a forty-third aspect, in the thirty-eighth aspect, when the read/write control part requests the disk drive to write data and the recording area of the data is defective, the read/write control part specifies the assigned alternate recording area, and the requests again the disk drive to write the data.
When the disk drive assigns an alternate recording area to one recording area, the data recorded thereon might be impaired. Therefore, in the forty-second or forty-third aspect, the read/write control part requests the disk array to write the data recovered based on the parity or other data, or specifies the alternate recording area to request again the disk array to write the data. Therefore, the disk array device can maintain consistency before and after assignment of the alternate recording area.
A forty-fourth aspect of the present invention is directed to a reassignment method of assigning an alternate area to a defective recording area of data, comprising:
transmitting an I/O request for requesting the disk drive to read or write operation by specifying a recording area of the data according to a request from outside; and
when the I/O request is transmitted in the transmission step, calculating an elapsed time from a predetermined time as a delay time and determining whether the recording area specified by the I/O request is defective or not based on the delay time, wherein when the recording area is defective in the determination step, the disk drive is instructed to assign the alternate recording area to the defective recording area.
A forty-fifth aspect of the present invention is directed to a reassignment method of assigning an alternate recording area to a defective recording area of data, comprising:
transmitting an I/O request for requesting the disk drive to read or write operation by specifying a recording area of the data according to a request from outside; and
when the recording area specified by the I/O request transmitted in the transmission step is defective, instructing the disk drive to assign the alternate recording area to the defective recording area, wherein in the instructing step, the disk drive is instructed to assign the recording area with which time required for read or write operation is within a predetermined range as the alternate recording area.
A forty-sixth aspect of the present invention is directed to a disk array device which assigns an alternate recording area to a defective recording area of data, comprising:
a read/write control part for transmitting an I/O request for requesting read or write operation by specifying a recording area of the data according to a request from outside;
a disk drive, when receiving the I/O request from the read/write control part, accessing to the recording area specified by the I/O request and reading or writing the data;
a reassignment part, when receiving the I/O request from the read/write control part, calculating an elapsed time from a predetermined process start time as a delay time, and determining whether the recording area specified by the I/O request is defective or not based on the delay time;
a first storage part storing an address of the alternate recording area previously reserved in the disk drive as alternate recording area information; and
a second storage part storing address information of the alternate recording area assigned to the defective recording area, wherein when determining that the specified recording area is defective, the reassignment part assigns the alternate recording area to the defective recording area based on the alternate recording area information stored in the first storage part, and stores the address information on the assigned alternate recording area in the second storage part, and the read/write control part generates the I/O request based on the address information stored in the second storage part.
In the forty-sixth aspect, the reassignment part determines whether the recording area is defective or not based on the delay time calculated from a predetermined process start time. Therefore, when a delay in the response returned from the disk drive is large, the reassignment part determines that the recording area being accessed for reading is defective, assigning an alternate recording area. This allows the disk array device to input and output data in real time, while suppressing occurrence of a large delay in response.
According to a forty-seventh aspect, in the forty-sixth aspect, the reassignment part assigns the alternate recording area to the defective recording area only when determining successively a predetermined number of times that the recording area is defective.
According to a forty-eighth aspect, in the forty-sixth aspect, the predetermined process start time is a time when the I/O request is transmitted from the read/write control part.
According to a forty-ninth aspect, in the forty-sixth aspect, the predetermined process start time is a time when the I/O request transmitted from the read/write control part is started to be processed in the disk drive.
According to a fiftieth aspect, in the forty-sixth aspect, the reassignment part further instructs the disk drive to terminate the read or write operation requested by the I/O request when detecting that the recording area of the data is defective.
According to a fifty-first aspect, in the forty-sixth aspect, the first storage part stores a recording area with which overhead in the disk drive is within a predetermined range as the alternate recording area.
In the fifty-first aspect, the first storage part manages the alternate recording areas in which the time required for read or write operation in the disk drive is within a predetermined range. Therefore, the data recorded on the alternate recording area assigned by the reassignment part is inputted/outputted always with a short delay in response. The disk array device thus can input and output data in real time, while suppressing occurrence of a large delay in response. Furthermore, the predetermined range is easily selected based on overhead, which is a known parameter. Therefore, the design of the disk array device can be more simplified.
According to a fifty-second aspect, in the fifty-first aspect, the first storage part further stores the alternate recording area by a unit of a size of the data requested by the I/O request.
In the fifty-second aspect, since the first storage part manages the alternate recording areas in a unit of the requested data, the alternate recording area to be assigned is equal to the requested data in size. Therefore, the reassignment part can instruct reassignment with simple processing of selecting an alternate recording area from the first storage part.
According to a fifty-third aspect, in the fifty-second aspect, whether the overhead is within the predetermined range or not is determined for the recording areas other than the alternate recording area by the unit, and the reassignment part assigns the alternate area to the recording area in which the overhead is not within the predetermined range.
In the fifty-third aspect, the reassignment part instructs assignment of an alternate recording area to the defective recording area at the timing other than that determined based on the delay time. The disk array device thus can input and output data more effectively in real time, while suppressing occurrence of a large delay in response. Furthermore, the predetermined range is easily selected based on overhead, which is a known parameter. Therefore, the design of the disk array device can be more simplified.
According to a fifty-fourth aspect, in the forty-sixth aspect, the address information stored in the second storage part is recorded in the disk drive.
In the fifty-fourth aspect, with the address managing information recorded on the disk drive, the second storage part is not required to manage the address information when the power to the disk array device is off. That is, the second storage part is not required to be constructed by a non-volatile storage device, which is expensive, but can be constructed by a volatile storage device at a low cost.
According to a fifty-fifth aspect, in the fifty-fourth aspect, the disk array device further comprises:
a non-volatile storage device storing an address of a recording area of the address information in the disk drive.
In the fifty-fifth aspect, since the non-volatile storage device stores the address information, even when a defect occurs the storage area of the address information in the disk drive, the address information is secured. It is thus possible to provide a disk array device with a high level of security.
According to a fifty-sixth aspect, in the forty-sixth aspect, the disk array device further comprises:
a plurality of disk drives including data recording disks device and a spare disk drive; and
a count part counting a used amount or remaining amount of alternate recording area, wherein the reassignment part determines whether to copy the data recorded in the data recording disk drives to the spare disk drive based on a count value in a count part, thereby allowing the spare disk drive to be used instead of the data recording disk drive.
In the fifty-sixth aspect, when there are shortages of alternate recording areas in the disk drive for recording data, a spare disk drive is used. Therefore, there occurs no shortage of alternate recording areas for reassignment at any time. The disk array device thus can input and output data more effectively in real time, while suppressing occurrence of a large delay in response.
A fifty-seventh aspect of the present invention is directed to a reassignment method of assigning an alternate recording area to a defective recording area of data, comprising:
transmitting an I/O request for requesting read or write operation by specifying a recording area of the data; and
when the recording area specified by the I/O request transmitted in the transmission step is defective, assigning the alternate recording area to the defective recording area, wherein in the assign step, when the specified recording area is defective, the alternate recording area is selected for the defective recording area by referring to alternate recording area information for managing an address of the alternate recording area previously reserved in the disk drive, the selected alternate recording area is assigned to the defective recording area, and further address information for managing an address of the assigned alternate recording area is created; and
in the transmission step, the I/O request is generated based on the address information created in the assign step.
According to a fifty-eighth aspect, in the fifty-seventh aspect, in the assign step, when the I/O request is transmitted, an elapsed time from a predetermined process start time is calculated as a delay time, and it is determined whether the recording area specified by the I/O request is defective or not based on the delay time.
These and other objects, features, aspects and advantages of the present invention will become more apparent from the following detailed description of the present invention when taken in conjunction with the accompanying drawings.