1. Field of the Invention
This invention relates to networked storage devices in a computer system. Particularly, this invention relates to managing such storage devices in a storage area network (SAN).
2. Description of the Related Art
A typical storage area network (SAN) comprises multiple entities such as hosts, storage subsystems, host bus adapters, storage volumes, and ports. In addition, a connection between two devices is a relationship that may be tracked. One example where such tracking is valuable is demonstrated in the case where three objects exist in a SAN and two of those objects can see the third, but they cannot see each other; tracking relationships reveals this, whereas tracking only entities would not. The state of these various entities and relationships may often be in flux. At any moment, entities get created, deleted, and become unreachable. In addition, the attributes of the entities may change, e.g. in response to events occurring in the SAN.
Storage infrastructure management software may be used to manage such complex SANs comprising equipment from multiple vendors. One important aspect of such management software is the process of running discovery tasks, or probes, to discover the entities in the SAN and the relationships (associations) between those entities. (The various entities and relationships within a SAN may be collectively referenced as “elements” hereafter.) Another key aspect of such management software is the use of a configuration database to store information about the discovered entities and relationships.
One feature such a storage infrastructure manager (SIM) must provide is health and fault management. One aspect of such health and fault management is to alert the administrator (e.g. via standardized events such as SNMP traps, messages to a pager, or highlighting of the element in the SIM GUI) when elements in the SAN appear to be “missing”. An element may be declared to be missing under a variety of scenarios. For example, an element is missing if it is not visible to the SIM through any of its SAN probing mechanisms. An element is also considered missing if the SIM is informed (via mechanisms, such as SNMP alerts or CIM indications, that it has in place to listen for events from SAN elements), that an entity (e.g., a storage volume) or relationship (e.g., a link between two ports) has been removed. An element is also missing if it is actually removed via the control interface or application programming interface (API) of the SIM itself.
Once a baseline discovery process of all elements in a SAN has been completed and the configuration database has been populated, it is relatively easy to process the missing element scenarios which either involve the SIM being informed about the event, or the removal occurring via use of the SIM's control interfaces. For example, when an event is received indicating that some element is no longer present in the SAN (e.g., a CIM indication informs the SIM that a storage volume has been deleted), or when the deletion occurs through the SIM's user interface or API itself, the configuration database can be updated in response, and an event can be generated to alert the administrator at the same time.
However, the scenario where an element simply becomes invisible to the SIM is more complex because there can be multiple sources of information that assist the SIM's discovery process and some types of information sources can provide duplicate or contradictory information about whether an element is visible to the SIM or not. Such informational inconsistencies must be reconciled by the SIM software before a conclusion about the visibility of the element can be reached. This problem is particularly significant because the disappearing device may not send a notice that it is going to disappear. This problem may be referred to as the “detectability problem.”
Ideally, a SIM, once installed, would perform a full discovery of all elements on the SAN, populate its configuration database, and from then on, keep the database up-to-date regarding SAN elements that are no longer visible to it via the SIM either being informed directly, or via the removal of the elements through the SIM API. However, there are two practical issues that cause less than ideal conditions. Firstly, not all operations to manipulate the SAN elements (e.g., to delete a storage volume) are performed through the SIM interface because other means, such as individual element managers (i.e. management software specialized for a specific storage subsystem) can be used instead, bypassing the SIM. Secondly, some kinds of physical (re)configuration cannot be performed by management software, such as removing the cable between a host's fibre channel port and that of a switch. If all systems on a SAN diligently generated events for all configuration changes that occur (e.g., including physically connecting two ports together or removing such a link), then ideally, the SIM software could listen for such events and update its database to reflect detectability status in real-time. But this is not always the case and leads to the second practical issue: not all systems on a SAN currently generate such events correctly. This is especially true in scenarios where the event subscriber (i.e., the SIM software) is not operational when the event is generated. Support for features such as persistent CIM indications would help in this area, but such implementations are not widely available, and equivalent robust event delivery mechanisms do not typically exist in the SNMP world. In view of this, it is evident that there is a need to address the detectibility problem using additional mechanisms.
A subset of the detectibility problem has been addressed before in some prior art—namely, products such as the Tivoli SAN Manager and the IBM TotalStorage Productivity Center v2.x. However, the sources of information performing discovery of SAN elements on behalf of the SIM were presumed to be nonauthoritative, and thus, the logic for inferring detectability was focused on achieving consensus between distinct sources of information that could provide either duplicate or contradictory information. Other systems and methods have also been developed to manage networked storage devices.
However, there is still a need in the art for systems and methods to detect missing elements in a storage area network with multiple sources of information. Particularly, there is a need in the art for such systems and method to efficiently resolve duplicate or contradictory information from distinct sources. These and other needs are met by the present invention as detailed hereafter.