The invention relates generally to computer systems, and deals more particularly with a technique to identify a failed component which prevents an application server from accessing storage via a switch fabric and a storage server.
It is known to allow application servers to access storage via a switch fabric, storage servers and respective disk drives. The storage servers can be IBM Enterprise Storage Servers (“ESS”), and the switch fabric can be an IBM Storage Area Network (“SAN”) switch fabric. The known switch fabric comprises a first set of switch ports to connect to respective application servers and a second set of switch ports to connect to respective storage servers. The known switch fabric also comprises internal switches to interconnect switch ports of the first set to switch ports of the second set, and control circuitry to determine which switch ports of each set to interconnect to forward an application request to the desired storage server. In the IBM SAN switch fabric, there are fiber channel (protocol) adapter cards plugged into the application servers, the switch fabric and the storage servers, and the first and second sets of switch ports are fiber channel ports. The storage servers manage disk drives and associated disk storage, and manage requests from the application servers to read from or write to the storage. Applications within the application servers address locations in the storage by a “virtual path” and then an address range. Each application server includes configuration data which maps each virtual path to a respective hard disk or other real address in the storage. Each application server includes other system configuration data which specifies which hard disks can be accessed via which fiber channel adapter card port of the application server. Consequently, when an application specifies a virtual path (and address range) from which to read data or to which to write data, the operating system in the application server translates the virtual path to a hard disk and identifies the fiber channel adapter card port of the application server through which to send the request en route to a storage server which manages the hard disk. Then, the operating system sends the request through this fiber channel adapter card port to the switch fabric. The switch fabric then determines from its own data which switch port of the second set is connected to the storage server which managers the hard disk specified in the application request, and interconnects the switch port of the first set connected to the application server to the switch port of the second set which manages the specified hard disk. Then, the switch fabric forwards the request to the storage server via this switch port of the second set. In some environments such as the IBM ESS environment, the applications read and write data in units called “Logical Units” of (“LUNS”), so the switch fabric converts the address range of the application request to logical units. After handling the application request, the storage server sends a response back to the application via the switch fabric using the same two switch ports.
Occasionally, there is a failure or other problem with one of the application servers, application server ports, cables leading from the application server to the switch fabric, switch fabric switch port of the first set, internal switch fabric switches, switch fabric switch port of the second set, cables leading from the switch fabric to the storage server, storage server ports, storage server, disk drive or disk storage. Consequently, when an application makes a request to read from or write data to storage via the switch fabric and a storage server, and a failure occurs with the application server or the storage server or the communication link between the application server and storage server, an code or nothing is returned. Then, an administrator attempts to determine the failed component from the large amount of data that is available. Normally each component that could possibly have failed is analyzed separately to determine if it has failed. This may require access to the storage servers and switch fabric switches.
An object of the present invention is to determine the cause of a failure when an application in an application server attempts to read from or write to storage via a switch fabric and storage server.
Another object of the present invention is to determine the cause of failure without requiring a special program agent on the switch fabric or storage server.