1. Field of the Invention
The present invention relates to a data storage system for use as an external storage apparatus of a computer, a data storage control apparatus, and a fault location diagnosis method therefor, and more particularly, a data storage system having a multiplicity of disk devices and control units connected through transmission paths, a data storage control apparatus, and a fault location diagnosis method therefor.
2. Description of the Related Art
In recent years, as a variety of electronic data are handled in computers, the importance of data storage apparatus (external storage apparatus) capable of storing a large amount of data with high reliability, independently of a host computer executing data processing, is increasing.
As such a data storage apparatus, a disk array apparatus constituted of a large number of disk devices (for example, magnetic disk devices and optical disk devices) and disk controllers for controlling such the large number of disk devices are put into use. The disk array apparatus can simultaneously receive disk access requests from a plurality of host computers and control the large number of disks.
Such the disk array apparatus has an incorporated memory which plays the role of a disk cache. With this, an access time to a data can be reduced when a read request or a write request is received from a host computer, and thus high performance can be obtained.
In general, the disk array apparatus is constituted of a plurality of major units, namely, a channel adaptor provided as a connecting portion to the host computer, a disk adaptor as a connecting portion to a disk drive, a cache memory, a cache controller taking charge of controlling the cache memory, and a large number of disk drives.
In such a complicated system, when a fault occurs in any unit, it is necessary to identify the fault location.
FIG. 10 shows an explanation diagram according to a prior art. A disk array apparatus 110 shown in FIG. 10 includes a pair of control units 112, 114. Each control unit 112, 114 further includes a cache manager (cache memory and cache controller) 122, and a channel adaptor 120 and a disk adaptor 124 which are connected to the cache manager 122.
Also, two cache managers 122 are directly connected to each other so as to enable communication therebetween. The channel adaptor 120 is connected to a host computer 100 by means of a Fiber Channel or an Ethernet (registered trademark). The disk adaptor 124 is connected to each disk drive 130-1 to 130-4 in a disk enclosure by means of, for example, FC loops 140, 142 of the Fiber Channel.
Namely, the disk adaptor 124 in a first control unit 112 accesses each disk drive 130-1 to 130-4 via a first FC loop 140, while the disk adaptor 124 in a second control unit 114 accesses each disk drive 130-1 to 130-4 via a second FC loop 142. With this, duplicated configurations of the control units and the connection paths are realized.
In such the configuration, based on a request from the host 100 via channel adaptor 120, the cache manager 122 in the control unit 112 performs a read access or a write access, via disk adaptor 124, to the disk drive 130-3 via a transmission path 140 such as the Fiber Channel.
At this time, when an error (for example, CRC error) is detected in the disk drive 130-3 or the disk adaptor 124, conventionally, the disk drive on the FC loop 140 is regarded as faulty, and diagnosis is started. Namely, by successively repeating connection and disconnection between the FC loop 140 and each disk drive, the faulty disk drive is identified (for example, Japanese Unexamined Patent Publication No. 2001-306262, FIG. 2).
However, in a storage system in recent years, in addition to redundancy, it is required to continue operation even when a fault occurs in any portion. According to the above prior art, it is difficult to identify which is defective, disk drive 130-3 or the path on FC loop 140 (including the disk adaptor 124).
Accordingly, it is not possible to instantly take an action to cope with the problem, such as accessing to disk drive 130-3 from the other controller (control unit) 114 via FC loop 142 when FC loop 140 is defective, as an example. As a result, it is difficult to continue the operation.