A data processing system comes to handle an increasing amount of data, and a storage system in which data is stored comes to have an increasing capacity in recent years. A data center is equipped with servers which run various processes by the use of data, storage systems in which data is stored and so forth. Fiber channel switches of high data rates are widely used as devices which connect the storage systems with the servers.
The number of fiber channel switches to be used increases as the number of servers, etc. installed in a data center increase. Further, a cascade connection method for connecting a plurality of fiber channel switches is used as well.
If something wrong with a device or a failure occurs on a fiber channel switch in such connection circumstances where lots of devices are connected with one another, a server detects an error in the fiber channel switch and notifies a CE (Customer Engineer) of the detected error. The CE collects a log of the fiber channel switch on which the error occurred, and analyzes the error. Then, the CE identifies spare parts to be exchanged and exchanges the parts. Maintenance work is practiced for the fiber channel switch which connects lots of devices in this way.
Further, a maintenance device, a remote monitoring device and so forth which practice maintenance work except for the maintenance work described above are used as well. Upon a failure occurring, e.g., the maintenance device identifies a similar failure and a recovery procedure from data of failures and recovery procedures in the past, and recovers the failed device from the failure in accordance with the identified recovery procedure. Further, the remote monitoring device receives monitoring data of the switch, identifies maintenance data on the basis of the received monitoring data, notifies an administrator of the identified maintenance data by email and displays the identified maintenance data on a Web screen at a website, etc.
Ordinary technologies have a problem, however, in that it takes time to identify a detailed cause of a malfunction having occurred on a device connected to lots of devices as described above, and that the device may not be recovered quickly from the malfunction.
In order, e.g., to identify a detailed cause of a malfunction having occurred on a device to be monitored according to the ordinary technologies, logs are collected from other devices. If lots of devices form the system, it takes long time to collect the logs. Further, as the collected logs are enormous and take long time to be analyzed, it takes lots of time to finally identify the cause of the malfunction. As a result, it takes lots of time to recover from the malfunction. Further, as it takes lots of time to recover from the malfunction, another malfunction may occur more possibly and damage caused by the malfunction may possibly expand.
Japanese Laid-open Patent Publications Nos. 2001-34509, 2002-55850 and 2006-107080 are examples of the related art.