Management of clusters of nodes requires the measurement of aggregate cluster properties specific to a particular application under consideration. Programmable metrics collection from resources and their aggregation aids the computation of custom metrics relevant for monitoring of system performance for multiple purposes such as accounting, load management, etc. This requires that metrics be collected from the instrumentation of the nodes of a cluster and aggregated to metrics that are input to the automated or manual management function of a cluster. A system that performs this aggregation is a measurement system.
Writing measurement systems from scratch can be a large effort. Each time a management function changes, the measurement system must be rewritten. Furthermore, when the cluster configuration changes (e.g., a node is added or deleted), the measurement system must again be rewritten.
Existing measurement systems are either custom-built or use a simple declarative specification as input to configure the kinds of metrics that can not accommodate dynamic clusters. An example of a conventional metric system which uses its own specific language is WSLA (Web Service Level Agreement).
WSLA uses metrics definitions which describe how complex metrics are to be computed from low-level metrics that are exposed during the instrumentation of a system. While the system is in use a measurement system can read the low-level metrics, compute the complex metrics by aggregating the low-level metrics as specified in the metrics definitions, and make them available to interested systems.
However, such existing measurement systems and their corresponding languages are not able to aggregate sets of equivalent metrics from a variable number of different nodes. For example, a cluster of computers may be used for multiple applications. Computers are assigned to different applications depending on the current demand for those applications. In such a dynamic environment, the set of computers assigned to a particular application is constantly changing.
The metrics language of WLSA requires that the metrics program it uses to capture complex metrics be re-written to take into account an added or deleted node. However, this approach is unsuitable for dynamically changing clusters because it is too labor intensive and slow, and hence unsuitable for automated real-time management.
Thus, there is a need for a measurement system and method which can efficiently compute complex metrics for a dynamic system having a constantly varying number of nodes.