1. Technical Field
The present invention relates to an improved data processing system and, in particular, to a method and apparatus for computer network managing.
2. Description of Related Art
A Storage Area Network (SAN) is an xe2x80x9copen systemxe2x80x9d storage architecture that allows multiple host computers to share multiple storage peripherals, and in particular, to share storage peripherals via a Fibre Channel (FC) network switch. The FC switch, host systems, and storage peripherals may be manufactured by different vendors and contain different operating environments.
Currently, there is a lack of an end-to-end problem determination capability or specification for an FC SAN. A complex configuration of multi-vendor systems, network switches, and peripherals makes it significantly more difficult to perform problem determination in a SAN environment than existing point-to-point storage configurations. As a result, failures in a SAN environment will cause an increase of system downtime as well as increasing cost of system maintenance.
It would be advantageous to have a method and apparatus that defines an xe2x80x9copen systemxe2x80x9d, real-time, end-to-end, error detection architecture that incorporates fault isolation algorithms to identify failing systems and/or components connected to a SAN.
A method and system for problem determination and fault isolation in a storage area network (SAN) is provided. A complex configuration of multi-vendor host systems, FC switches, and storage peripherals are connected in a SAN via a communications architecture (CA). A communications architecture element (CAE) is a network-connected device that has successfully registered with a communications architecture manager (CAM) on a host computer via a network service protocol, and the CAM contains problem determination (PD) functionality for the SAN and maintains a SAN PD information table (SPDIT). The CA comprises all network-connected elements capable of communicating information stored in the SPDIT. The CAM uses a SAN topology map and the SPDIT are used to create a SAN diagnostic table (SDT). A failing component in a particular device may generate errors that cause devices along the same network connection path to generate errors. As the CAM receives error packets or error messages, the errors are stored in the SDT, and each error is analyzed by temporally and spatially comparing the error with other errors in the SDT. If a CAE is determined to be a candidate for generating the error, then the CAE is reported for replacement if possible.