A storage system provided with a plurality of magnetic disk drives is now used for storing a huge volume of data.
FIG. 7 is a view schematically illustrating an entire configuration of a conventional storage system. A storage system 300 illustrated in FIG. 7 implements a RAID (Redundant Array Independent Disks) configuration for a magnetic disk drive to increase data redundancy so as to provide desirable performance characteristics to a host 200.
A description will be given of functions of respective modules constituting the storage system 300. A CA (Channel Adapter) 2 controls the interface with the host 200. Upon reception of data write/read operation request from the host 200, the CA 2 notifies a CM (Centralized Module) 400 of a processing request. The CA 2 directly accesses a cache memory on the CM 400 to perform data transfer between the storage system 300 and host 200.
The CM 400 serves as a core of all the modules constituting the storage system 300. The CM 400 performs resource management (manages the resources of each module and executes effective control management). Further, the CM 400 performs cache memory management (manages the allocation of the memory area in the CM 400 and executes general control). The CM 400 retains maintenance software and uses the maintenance software to provide various services. The CM 400 is provided with a DI (Device Interface) 11 which is a module for performing communication with a DE group 500 (details of which will be described later) composed of a plurality of magnetic disk drives. The DI 11 is connected to the DE group 500 by a fiber channel interface (hereinafter, fiber channel is abbreviated as “FC”). Through the DI 11, the CM 400 performs control of the FC interface communicating with the DE group 500, I/O control of the magnetic disk drive, control of RAID, and the like.
For redundancy, the storage system 300 has two CAs 2 and two CMs 400.
FIG. 8 is a view schematically illustrating a multi-initiator connection between the CMs 400 (DI 11) and magnetic disk drives. The DIs 11 and magnetic disk drives serving as initiators each are provided with an FC port. The respective FC ports are connected to one another to form an arbitrated loop.
Some of the terms used in the following description will be defined.
Arbitrated Loop (Hereinafter, Referred to Merely as “Loop”)
The arbitrated loop is one of fiber channel topologies, in which a plurality of FC ports are connected in a loop so as to allow communication to be performed between a pair of the ports. The arbitrated loop supports up to 127 devices.
AL_PA and Loop ID
An address to be uniquely used in the arbitrated loop is assigned to each port in the loop. This address is referred to as AL_PA. The values of the AL_PA are not consecutive numbers and therefore the AL_PA is sometimes difficult to handle. For this reason, consecutive numbers are assigned to the respective AL_PA. Each of the consecutive numbers is referred to as Loop_ID. The value of the AL_PA can be fixed in a device in a hardware manner. However, in the case where the same value is assigned to the AL_PA by accident, AL_PA setting in a device nearer to a Loop Master is prioritized according to the priority set in the loop.
Loop Master
The Loop Master is a port that leads loop initialization, which is determined at the time of execution of loop initialization. If the loop includes a fabric port, the fabric port becomes the Loop Master. If the loop does not include a fabric port, a port having the smallest WWN (World Wide Name: world wide unique name) (in the present invention, WWPN (World Wide Port Name is referred to as WWN)) value in the loop is selected as the Loop Master.
Loop Initialization
The loop initialization is a necessary process for recognizing a device connected to the loop so as to make the device operational. The loop initialization is executed when a LIP (Loop Initialization Primitive) is issued from the Loop Master onto the loop.
Login
The Login is a procedure that exchanges information WWN, etc.) of a target port before data transfer so as to allow the port to be accessed.
Each FC port carries out the loop initialization when performing communication. The AL_PA (identification information) of each FC port is determined by the loop initialization. The setting of the AL_PA is made in the connection order of the devices starting from the Loop Master. A value to be set to the AL_PA can be specified in a hardware manner (hard assignment), and the specified value is directly set to the AL_PA in most cases. However, in the case where there exists an AL_PA with a hard assignment value on the loop, the specified AL_PA value is not set but the AL_PA value is set in a software manner (soft assignment) in the subsequent sequence in a predetermined order. The order (ascending order or descending order) of the soft assignment is determined based on a predetermined setting. It goes without saying that the same AL_PA value as that has been assigned to an FC port by hard assignment is not permitted.
The login procedure is executed after the loop initialization, and after the completion of the login procedure, communication with the magnetic disk drive becomes possible.
FIG. 9 is a view schematically illustrating a relationship between the DEs and magnetic disk drives. An apparatus in which the magnetic disk drive is mounted is called DE (Drive Enclosure). A plurality of magnetic disk drives can be mounted in one DE. The DE according to the present invention can be connected to the arbitrated loop and has a HUB function that allows the magnetic disk drive mounted therein to participate in the loop.
The DE can be connected to the loop by an FCC (Fiber Channel Controller) incorporated therein and can participate in the loop by the AL_PA assigned thereto. By using the DE to constitute a loop and by incorporating the magnetic disk drive in the DE connected to the loop, the magnetic disk drive can easily participate in the loop.
A configuration of the conventional DE group 500 will be described with reference to FIG. 10. The DE group 500 is constituted by a plurality of DEs 3. The DE 3 is provided with two FCCs 31 which are individually connected to respective magnetic disk drives (Disk 32A, Disk 32B, . . . , Disk 32N, which are sometimes collectively referred to as Disk 32), so that the two FCCs 31 can refer to the Disk 32 in the same way. With the above configuration, two access paths can be ensured with respect to the respective disk drives 32. For example, even if one of the two FCCs 31 malfunctions to affect one of the two loops in the DE 3, the CM 400 can access the Disk 32 using the other side loop. The FCC 31 in one DE 3 is cascade-connected to the FCC 31 in another FCC 31 and thereby the plurality of DEs 3 in the DE group 500 are connected to each other.
A configuration table retained in the CM 400 will be described with reference to FIG. 11. The configuration table is a list for storing information concerning the modules constituting the storage system 300 and represents a correspondence between the name of a module and status of the module. When given processing is executed in the storage system 300, the content of the configuration table is referred to.
FIG. 12 is a view schematically illustrating I/O processing of the storage system 300. The CM 400 checks the status of the magnetic disk drive (Disk 32N) to be accessed by referring to the configuration table before executing I/O processing. When confirming that the magnetic disk drive to be accessed is in normal status on the configuration table, the CM 400 executes the I/O processing for the magnetic disk drive.
The configuration table is updated every time the status of the module in the storage system 300 is changed. For example, as illustrated in FIG. 13, in the case where the magnetic disk drive becomes abnormal, the status thereof is updated from normal to abnormal.
Configuration table update processing will be described with reference to FIG. 14. For example, the magnetic disk drive becomes abnormal, the DI 11 notifies the CM 400 of module status change. The CM 400 receives the notification of the module status change and then updates the status of the module on the configuration table. Examples of cases where the module status is to be changed include a case where the CM 400 malfunctions, a case where the DE 3 is removed from a predetermined position in the storage system 300, a case where the magnetic disk drive has failed and cannot be used, a case where a new magnetic disk drive is added, and the like.
Although, in this manner, the CM 400 updates the configuration table when the module status is changed, there exists a time lag between the time at which the DI 11 detects the failure of the magnetic disk drive and the time at which the configuration table on the CM 400 is updated. During this time lag, the CM 400 cannot detect the failure of the magnetic disk drive. It follows that the CM 400 cannot inhibit issuance of I/O but issues I/O even if the magnetic disk drive has failed during the time lag.
A further description will be given of the time lag with reference to FIG. 15. Note that the DI 11 retains a different configuration table from that provided in the CM 400, which represents the statuses of the magnetic disk drives. FIG. 15 is a time chart illustrating operations of the host 200, CM 400, and DI 11 in the configuration table update processing based on the configuration tables retained in the CM 400 and DI 11. In FIG. 15, the statuses are differentiated by the width of an arrow, and the time flows downward.
In the initial state, all the Disk 32 are in normal status and there is no problem. In the case where the Disk 32N has failed at a given timing, the DI 11 detects the failure of the Disk 32N and updates the status of the Disk 32N on the configuration table retained therein from normal to abnormal.
Thereafter, the DI 11 notifies the CM 400 that the DI 11 has updated the configuration table retained therein. The CM 400 recognizes that the configuration table of the DI 11 has been updated by receiving the notification from the DI 11 and correspondingly updates the configuration table retained in the CM 400. The time lag occurs between the time at which the configuration table retained in the DI 11 is updated and the time at which the update of the configuration table is notified to the CM 400, and during this time lag, the CM 400 cannot detect the abnormality of the Disk 32N.
There is disclosed, as a conventional art, a disk array device capable of preventing any operation leading to the degeneration of the originally normal disk device. Further, there is disclosed a control method capable of preventing occurrence of a malfunction in an exchanged hard disk drive (HDD) device and detecting a connection miss of the HDD device when an HDD device is exchanged during the operation of a display array device.
Further, there is disclosed, as a conventional art, a data storage system capable of reducing a load of a data bus for connecting a host system and a data storage system. Further, there is disclosed a RAID device capable of preventing, even in the case where a failure which may induce a loop abnormality occurs in a RAID using disks with an FC interface, correcting the loop abnormality so as to prevent data loss.
Patent Document 1: Japanese Laid-Open Patent Publication No. 2002-108573
Patent Document 2: Japanese Laid-Open Patent Publication No. 11-85412
Patent Document 3: Japanese Laid-Open Patent Publication No. 8-263225
Patent Document 4: Japanese Laid-Open Patent Publication No. 2003-162380
When detecting that access to the magnetic disk drive cannot be made, the storage system 300 provided with conventional DEs 3 determines that the magnetic disk drive has failed and logically separates the relevant magnetic disk drive from the storage system 300. The access disabled state can be caused by a failure of the magnetic disk drive itself or by power failure of the DE 3.
When a power failure of the DE 3 occurs, the AL_PA of the magnetic disk drive provided in the DE 3 disappears from the loop and therefore the magnetic disk drive in the DE 3 becomes invisible to the CM 400. The CM 400 records information indicating that the magnetic disk drive in the DE 3 has become invisible to the CM 400 in the configuration table for update.
When the information indicating that the magnetic disk drive in the DE 3 has become invisible to the CM 400 exists on the configuration table of the CM 400, I/O is never executed. However, as described above, a time lag exists until the information of the magnetic disk drive that has become invisible is registered in the configuration table. Thus, a timing at which I/O is issued to the invisible magnetic disk drive may exist.
FIGS. 16A to 16C and FIGS. 17A to 17C are views illustrating conventional processing executed in the case where I/O is issued, during the time lag, to the magnetic disk drive (magnetic disk drive in the DE 3 in which a power failure has occurred) provided in the DE 3 which is in a disabled state due to occurrence of a power failure. In FIGS. 16A to 16C and FIGS. 17A to 17C, A and B each denote a loop. That is, the storage system 300 has two paths to the DE 3.
FIG. 16A illustrates a normal state, and FIG. 16B illustrates a state where a power failure has occurred in the DE 3. When a power failure occurs in the DE 3, the AL_PA of the magnetic disk drive provided in the DE 3 and therefore the magnetic disk drive temporarily disappears (during the power failure) from the loop. The reason that the AL_PA disappears is that loop initialization is activated in the case where a power failure has occurred in the DE 3 and, in this case, the value of the AL_PA is not determined.
When an access request to the magnetic disk drive is generated in this state, the CM 400 tries to access the magnetic disk drive of the DE 3 in which a power failure has occurred through one path (in this case, path A), but the access fails (see FIG. 16C).
The CM 400 then tries to access the magnetic disk drive through the other path (in this case, path B), but the access fails (see FIG. 17A). The CM 400 recognizes that the access operation to the magnetic disk drive using all the access paths has failed and determines that the magnetic disk drive to be accessed has failed. At this timing, the CM 400 logically separates the magnetic disk drive from the storage system 300.
When a power supply is resumed to the DE 3, the DE 3 is recovered. At this time, the AL_PA of the magnetic disk drive provided in the DE 3 appears and, therefore, the magnetic disk drive has become visible on the loop (see FIG. 17B).
When the DE 3 is recovered after the resumption of power supply, the magnetic disk drive is accordingly recovered. However, the magnetic disk drive that was determined to be abnormal earlier is still separated from the storage system 300 as one in an access disabled state (see FIG. 17C). As described above, since the power failure of the DE 3 cannot be detected, the CM 400 has no choice but to determine that the disappearance of the magnetic disk drive in the DE 3 from the loop is caused by a failure of the magnetic disk drive even in the case where a power failure has actually occurred in the DE 3.
When it has been once determined that the magnetic disk drive has failed as described above, magnetic disk drive is still recognized as failed one even after the resumption of power supply. In order to recover the storage system 300 to a normal state afterward, it is necessary to replace the magnetic disk drive that has been determined to have failed although the magnetic disk drive is actually normal.
A further description will be given of the above processing with reference to the flowchart of FIG. 18.
A power failure of the DE 3 occurs and the AL_PA of the magnetic disk drive in the DE 3 in which the power failure has occurred disappears from the loop (step S101). The storage system 300 cannot detect the power failure.
In the case where an I/O access has been issued to the magnetic disk drive (step S102), the CM 400 uses a first access path to determine whether the AL_PA of the access target magnetic disk drive exists on the loop (step S103: first determination).
In the case where the AL_PA of the access target magnetic disk drive does not exist on the loop (No in step S103), the I/O processing for the magnetic disk drive fails. The CM 400 then uses a second access path to determine whether the AL_PA of the access target magnetic disk drive exists on the loop (step S104: second determination).
In the case where the existence of the AL_PA of the access target magnetic disk drive cannot be detected also in the second determination (No in step S105), which means the magnetic disk drive does not exist on the loop on both paths, so the CM 400 determines that an abnormality has occurred in the magnetic disk drive and updates the configuration table retained therein. It should be noted that the CM 400 updates the status of the magnetic disk drive on the configuration table from normal to abnormal and determines that the failure of the I/O processing for the magnetic disk drive is due to a failure in the magnetic disk drive.
In the case where the existence of the AL_PA of the access target magnetic disk drive can be detected in the first determination (Yes in step S103), which means that the access target magnetic disk drive is a magnetic disk drive unrelated to the power failure of the DE 3, so the I/O processing is performed without problems. Further, in the case where the existence of the AL_PA of the access target magnetic disk drive can be detected in the second determination (Yes in step S105), the CM 400 uses the second access path to perform the I/O processing for the magnetic disk drive without problems (step S107).
Considering the above, the following problems exist in the conventional storage system 300.
The storage system 300 cannot detect the power failure of the DE 3, so that when the power failure occurs in the DE 3, the storage system 300 recognizes as if the magnetic disk drive provided in the DE 3 suddenly disappeared from the loop and, as a result, determines that a failure has occurred in the magnetic disk drive.
Further, in the case where I/O processing is executed, during the time lag, for the magnetic disk drive in the DE 3 in which a power failure has occurred, the I/O processing fails. Further, since the magnetic disk drive does not exist on the loop, the CM 400 determines that any failure has occurred in the magnetic disk drive and separates the magnetic disk drive from the storage system 300. After the DE 3 is recovered from the power failure, the magnetic disk drive, which has not actually failed, appears on the loop. However, since the magnetic disk drive is determined to have failed at the time of occurrence of the power failure, it is not incorporated in the storage system 300, treated as a failed one, and forced to be replaced with a new one.