Host computers (including host servers) may be connected to a mass storage system in various manners.
FIG. 1 illustrates a prior art SAN (Storage Attached Network) environment 8, where hosts servers communicate with a mass storage system 10 via a network for accessing one or more logical volumes provided by the mass storage system. The communication via the SAN is facilitated by switches such as fibre channel switches.
The host servers can be grouped to a cluster of host servers, for example, Structured Query Language (SQL) servers 21 and 22 can be clustered into an SQL cluster 20 that provides an SQL service to applications, other computers or application servers. Another type of cluster is a virtual machine cluster.
For the sake of guarantying data availability and accessibility in case of equipment failures, elements in the network are redundant. Each host server can be connected to the mass storage network via two or more ports, also known to as initiator ports or host ports. Each logical volume in the mass storage system can be exposed to hosts via more than one port of the mass storage system, also known as target ports. Generally two switches facilitate the communication between the host servers and the mass storage system. More than one host server may exist in the cluster that provides a certain service (e.g., hosting, SQL) to applications.
In FIG. 1, SQL server 21 is illustrated as being connected to the SAN via HBA (host bus adapter) ports H1a 23 and H1b 24, which in turn are respectively connected to network switches 31 and 32. SQL server 22 is connected to the SAN via HBA ports H2a 25 and H2b 26, which in turn are respectively connected to network switches 31 and 32. SQL cluster 20 provides SQL services to applications or other computers (not shown), by load balancing the workload between the two SQL servers 21 and 22. In case one of the SQL servers fails, the other SQL server can still provide the service, though without computing redundancy. If one of the ports connecting the SQL server to the network fails, the other port can still provide full connectivity of the SQL server.
The configuration of the switches defines that incoming data from a certain host port can be forwarded to a certain storage port(s). The definition of which host port can access which target port is known as a “zone”. The zone configuration of switch 31 is represented by zone information 41 that defines that port H2a 25 is allowed to forward data to ports S1 15 and S3 17 and port H1a 23 is also allowed to forward data to ports S1 15 and S3 17. Likewise, switch 32 is configured according to zone information 42 that defines that port H2b 26 is allowed to forward data to ports S2 16 and S4 18 and port H1b 24 is also allowed to forward data to ports S2 16 and S4 18.
Mass storage system 10 includes multiple physical storage devices that are mapped to logical volumes. The logical volumes may be exposed to cluster 20 as LUNs (Logical Unit Number) such as LUN 11 that may be accessible through any target port S1 15, S2 16, S3 17 and S4 18.
Mass storage system 10 stores LUN mapping for mapping each LUN allocated to a cluster to host ports of that cluster. For example LUN 11 is mapped to all of the host ports of cluster 20, i.e., host ports H1a 23, H1b 24, H2a 25 and H2b 26. Each accessible LUN in the mass storage system should have such LUN mapping that allows a plurality of host ports to initiate input/output (I/O) requests towards the logical volume identified by the LUN.
The mass storage system 10 may be aware of which host ports are connected and communicative with the mass storage system, by receiving a port login message upon each new host port is added to the cluster. The login message includes the identifier of the host port which is typically a WWPN (World Wide Port Name) that is a unique identifier in the network.
Once the connected host ports are recognized by the mass storage system, the LUN mapping, in the mass storage system, can be established between the connected host ports and requested LUNs.
Suppose SQL cluster 20 is requested to gain access to LUNs 11-14, each host port H1a 23, H1b 24, H2a 25 and H2b 26 is required to be mapped to each of the four LUNs. In order to avoid individual mapping definition for each port of SQL cluster 20, some mass storage systems enable a configuration for the entire cluster, such as all host ports are associated with the entity ‘SQL cluster’ and all the required LUNs are also associated with the ‘SQL cluster’, as illustrated in cluster mapping 44, so that upon adding a new host port, the host port is added to the list of host ports associated with the cluster and there is no need to explicitly associate the new host port with each of the accessible LUNs.
The mass storage system is further aware of host ports that get disconnected by receiving from the network switches 21 and 22 notifications of the disconnected host ports.
Although the mass storage system 10 is aware of host ports disconnection it is unaware of the whether one or more disconnected ports causes a minor problem (for example—one host computer lost one of its host ports), a major problem (for example—a host computer lost all its host ports) or a critical problem (for example—an entire cluster got disconnected).
Some vendors provide external platforms for SAN analysis that monitor the multi-vendor equipment (including servers, network switches and storage equipment) in the SAN for evaluating availability of end-to-end service. Such a platform is connected to all the equipment of all vendors in the SAN and gathers information from the different devices. Examples of such monitoring platform includes: NetApp® OnCommand™ Insight, EMC Smarts and HP SAN Visibility.
There is a growing need to provide the mass storage system means for evaluating the state of the cluster without contacting the equipment of all vendors.