In modern day computer systems, system availability and scalability are important problems. In particular, downtime that results from faults and scheduled maintenance often cannot be tolerated by many users. Further, many users desire computer systems that can expand as their businesses grow, so that they can protect their initial and subsequent investments in the systems.
One conventional method for solving both of these problems is to provide a computer system with “clusters.” A cluster is a group of computers connected in such a way that they work as a single, continuously-available system. Generally, a cluster includes multiple servers, called “nodes” all connected to a set of network interfaces and to a set of storage resources. In a cluster, resources are made more available by redundancy. Typically, clusters use redundant servers, redundant interconnects, redundant networking, redundant storage and even redundant controllers and adaptors. The redundant resources permit processing to continue transparently if one or more hardware or software components of a cluster fails. In a failure situation, processing can continue uninterrupted by automatically switching processes to redundant working components in a process called “failover.”
In addition to fault tolerance, planned downtime required for system maintenance can be greatly reduced in a cluster by transparently moving work from the node that needs maintenance to another node. Once the maintenance is performed, the work can be moved back to the original node. Clusters also provide a way to add capacity (performance and storage) to servers. The extra nodes help increase the server's throughput and storage capacity, so that growing demands can be met.
A typical networked system using clusters is illustrated schematically in FIG. 1. System 100 comprises many workstations, of which workstations 102 and 104 are illustrated. Workstations 102 and 104 are connected by a conventional network 106 to a number of servers. Server 108 is a standalone server and may be connected to the network 106 by a redundant connection 110. Servers 112 and 114 are clustered.
In particular, server 112 comprises two nodes 116 and 118 that are interconnected as illustrated schematically by connection 120. Each of nodes 116 and 118 is connected to storage resources 126 as indicated schematically by connection 122 for node 116 and connection 124 for node 118.
Similarly, server 112 comprises three nodes 130, 132 and 134 that are interconnected as illustrated schematically by connections 136, 138 and 140. Each of nodes 130, 132 and 134 is connected to storage resources 148 as indicated schematically by connection 142 for node 130, connection 146 for node 132 and connection 144 for node 134.
Software applications that run on a cluster are integrated with the cluster by means of “agents” (also called “data services.”) An agent is a program that is written for a software application and which can start, stop and monitor the health of the software application. Agents are, in turn, controlled by resource group managers that monitor the state of the cluster. If a node fails, a resource group manager can stop an application running on that node and restart the application on another node.
While clustering provides advantages with regard to resource availability and scalability, it can complicate resource management. For example, in order to manage storage resources and storage capacity, it is necessary to generate an accurate inventory of all storage resources, such as disks, file systems, volumes, and volume groups, and their usage. In a clustered system, this can be done by scanning each cluster for resources associated with its nodes.
However, storage resources are often visible and available to multiple clusters. Alternatively, storage resources that are part of a cluster may only be visible to the nodes in that cluster. In addition, some resources, such as volume groups, aggregate, abstract and hide details of the resources contained within them and make these details unavailable to any system, including any cluster, from which they are visible. Other resources, such as disks, may be visible from a particular system, but not available to that system due to disk fencing techniques. Such a resource can be termed as visible, but unavailable.
In order to inventory storage resources and determine usage and unused capacity, management or scanner software scans the clusters to detect storage resources in two contexts: to collect resource information for shared resources in the cluster and to collect resource information for resources that are private to the cluster nodes. Depending on the context of the scanning process, the scan must collect only relevant resources and count each resource only once. However, because the same resources may be visible from several clusters, if the resources seen by all systems at all resource levels are counted, the resultant data will be inaccurate, typically over counting resource information in the inventory and overstating storage capacity and consumption information. Thus, it may be difficult to generate an accurate inventory of storage resources from any particular system or cluster.