This invention relates to a storage system, and more particularly, to a technique of identifying a failure component.
Timeout error of a host computer has to be prevented in an information system that has a storage system. This is because timeout error causes an operating system of the host computer to panic, with the result that the entire information system is shut down.
When a storage system is to have such features as high reliability and high availability, it is therefore necessary to avoid timeout error of a host computer and minimize the retry count, as well as to prevent data loss by enhancing the redundancy of data and components.
An example of techniques of avoiding host computer timeout error is disclosed in JP 2002-358170 A. According to the technique disclosed in JP 2002-358170 A, a storage system and a host computer operate in conjunction with each other to avoid timeout error of the host computer.
JP 2001-256003 A discloses a storage system that has redundant components. The storage system disclosed in JP 2001-256003 A duplicates,
for redundancy, components including a hard disk drive (HDD), which stores data, a cache memory, and an access path. When a failure occurs in the storage system disclosed in JP 2001-256003 A, a switch is made from a regular component to its substitute component, so data is accessed while bypassing the failure. This enables a host computer to continue processing without running out of time.
There are two types of failure: persistent failure and intermittent failure. A component experiencing persistent failure behaves in the same, wrong way in response to the same access. Intermittent failure, on the other hand, is a failure preceding persistent failure. A component undergoing intermittent failure behaves sometimes rightly and other times wrongly in response to the same access.
The behavior of a component undergoing intermittent failure is thus inconsistent to access made to identify a failure component. As a result, a component where a failure has occurred (failure component) is not always identified successfully in the case of intermittent failure.
JP 2001-94584 A discloses a technique of identifying an intermittent failure component. According to the technique disclosed in JP 2001-94584 A, failure information is collected and a failure component is identified based on the collected failure information.