A storage area network comprises host applications servers connected to storage devices via a network fabric. Storage area networks decouple storage from application service and allow storage to be expanded and managed independent of the application servers. To assist system administrators in the task of managing storage, a storage area network system allows the administrator to collect information from the nodes in the storage area network, which generate reports about the performance and attributes of the storage area network, and analyze historical data about the storage area network to reveal trends.
A key parameter in the performance of a storage area network resource management system is the efficient collection of information from the nodes in the storage area network. The storage area network resource management system places data collection agents in every application server whose role is to obtain information about the nodes.
A simplistic approach would require every data collection agent to obtain information about every visible node in the storage area network. However, this approach is inefficient since the nodes are shared between application servers, and data collection agents would be required to collect significant redundant information. Such redundant information could consume critical network bandwidth in the storage area network and overwhelm the storage area network resource management server.
An efficient storage area network resource management system should provide an assignment of data collection agents to storage area network nodes. This assignment would allow data collection agents to collect information only from the assigned storage area network nodes and relay the information to the storage area network resource management server. Consequently, the amount of collected information would be minimized and the bandwidth and processing costs reduced.
Furthermore, the assignment of data collection agents to storage area network nodes should be load-balanced. If the load is disproportionately distributed between the data collection agents, the efficiency of the storage area network resource management system will be limited by the performance of the data collection agents with the largest number of assignments. Consequently, the assignment of data collection agents to storage area network nodes should be equitably distributed.
However, the assignment of data collection agents to storage area network nodes should also consider the consequences of failure of both data collection agents and storage area network nodes. For example, if a data collection agent fails, there should be provided a backup data collection agent to collect information from the storage area network nodes assigned to the failed data collection agent. Similarly, if a data collection agent fails to collect information from a storage area network node, a second data collection agent should be able to confirm the failure of the storage area network node.
The assignment problem of data collection agents to storage area network nodes should be reduced to the maximal set cover problem in graphs. The storage area network could be represented as a graph with the storage area network nodes and data collection agents being the vertices. The connectivity between the storage area network nodes and the data collection agents determine the edges in the graph. The goal is to find a maximal collection of data collection agents vertices that can cover all the storage area network node vertices in the graph. However, the maximal set cover problem is difficult to evaluate, and thus an approximation should be used. Moreover, the constraints of load balancing and fail-over add further complexity to any approximation algorithm used.
One approach to this problem is to assign a storage area network node to the data collection agent with the lowest number of assignments and then continue the process until all assignments are completed. Since the first set of assignments has no guarantee of a load balance, this algorithm would employ successive iterations with a convergence criterion until an acceptable solution is found. However, this approach does not provide a solution to the fail-over requirement and may be polynomial in complexity if the convergence for load balancing is not sub-linear.
What is therefore needed is a system and an associated method for assigning data collection agents to storage area network nodes that will ensure load balancing, and that handle failure of both data collection agents and storage area network nodes. The need for such a system and method has heretofore remained unsatisfied.