Host processor systems may store and retrieve data using storage devices (also referred to as storage arrays) containing a plurality of host interface units (host adapters), disk drives, and disk interface units (disk adapters). Such storage devices are provided, for example, by EMC Corporation of Hopkinton, Mass. and disclosed in U.S. Pat. No. 5,206,939 to Yanai et al., U.S. Pat. No. 5,778,394 to Galtzur et al., U.S. Pat. No. 5,845,147 to Vishlitzky et al., and U.S. Pat. No. 5,857,208 to Ofek, which are incorporated herein by reference. The host systems access the storage device through a plurality of channels provided therewith. Host systems provide data and access control information through the channels of the storage device and the storage device provides data to the host systems also through the channels. The host systems do not address the disk drives of the storage device directly, but rather, access what appears to the host systems as a plurality of logical volumes. Different sections of the logical volumes may or may not correspond to the actual disk drives. The hosts, storage devices and/or other elements, such as switches and/or array components, may be provided as part of a storage area network (SAN).
Performance characteristics of the storage devices and/or other elements of the SAN may be monitored according to different performance statistics and measures. Performance characteristics may include, for example, performance data, capacity data, and/or discovery data, including configuration data and/or topology data, among other characteristics. As an example, performance characteristics of input/output (I/O) data paths among storage devices and components may be measured and may include I/O operations (e.g., measured in I/Os per second and Mbs per second) initiated by a host that will result in corresponding activity in SAN fabric links, storage array ports and adapters, and storage volumes. Other characteristics may similarly be measured. Such characteristics may be significant factors in managing storage system performance, for example, in analyzing use of lowering access performance versus more expensive higher performance disk drives in a SAN, or by expanding number of SAN channels or channel capacity. Users may balance performance, capacity and costs when considering how and whether to replace and/or modify one or more storage devices or components.
Known techniques and systems for performing root cause and impact analysis of events occurring in a system may provide automated processes for correlating the events with their root causes. Such automation techniques address issues of an outage causing a flood of alarms in a complex distributed system comprised of many (e.g., thousands) of interconnected devices. Reference is made, for example, to: U.S. Pat. No. 7,529,181 to Yardeni et al., entitled “Method and Apparatus for Adaptive Monitoring and Management of Distributed Systems,” that discloses a system for providing adaptive monitoring of detected events in a distributed system; U.S. Pat. No. 7,003,433 to Yemini et al., entitled “Apparatus and Method for Event Correlation and Problem Reporting,” that discloses a system for determining the source of a problem in a complex system of managed components based upon symptoms; and U.S. Pat. No. 6,965,845 to Ohsie et al., entitled “Method and Apparatus for System Management Using Codebook Correlation with Symptom Exclusion,” that discloses a system for correlating events in a system and provides a mapping between each of a plurality of groups of possible symptoms and one of a plurality of likely problems in the system, all of which are assigned to EMC Corporation and are incorporated herein by reference. However, it is noted that such known techniques and systems may involve the maintaining of a large hierarchical relationship structure of alerts that may cause undesirable performance bottlenecks in some circumstances, such as in connection with the processing of updates in the distributed system.
Accordingly, it would be desirable to provide a system that may be advantageously and efficiently used to determine alert relationships in a SAN, including relationships among root causes, symptoms and impacts of events on various components of the SAN.