Performance metrics may be collected from network elements in a network for a variety of reasons. For example, performance metrics may be collected and processed to determine whether a network provider is providing a certain level of service, such as a level stated in a Service Level Agreement (SLA).
FIG. 1 illustrates an exemplary existing system 100 including a network 102, a collector/validator 104 and network elements 106-1, 106-2, 106-3 (collectively referred to as network elements 106) connected to network 102. Collector/validator 104 may request performance metrics from network elements 106. Network elements 106 may be network devices including, for example, host computers, routers, and network nodes. Collector/validator 104 may use the well-known Simple Network Management Protocol (SNMP) to request and receive the metrics from the network elements 106.
FIG. 1 is an exemplary existing system and may include more or fewer items than illustrated. For example, system 100 may include multiple collector/validators 104, each collecting performance metrics from a subset of the group of network elements 106.
In addition to being responsible for collecting data, such as performance metrics, collector/validator 104 may be responsible for performing other functions, such as validating a configuration change and reestablishing contact with network elements. While collecting performance metrics, if collector/validator 104 cannot establish contact with a network element, collector/validator 104 may attempt to reestablish contact numerous times until the contact is established. Because collection functions, configuration validation functions and contact reestablishment functions of collector/validator 104 share processing resources, collector/validator's 104 configuration validation functions and contact reestablishment functions, in a large network, may have an adverse effect on the collection functions. Thus, in a large network with many configuration changes and frequent loss of contact with network elements 106, uncollected performance metrics may accumulate at network elements 106. When collector/validator 104 is unable to collect the performance metrics from network elements 106 due to inability to contact network elements 106 or time spent performing other functions, network element 106 may use limited storage space or memory to store accumulating performance metrics. Consequently, the longer a time period in which performance metrics are uncollected from a network element 106, the greater the probability of losing performance metric data accumulating in network elements 106.
When collector/validator 104 is in a successful steady state and is in the process of collecting performance metrics from network elements 106, using a protocol, such as, for example, SNMP, collector/validator 104 may spend approximately 100 milliseconds (ms) collecting the performance metrics from each of the network elements 106. Of the 100 ms of the collection processing for each network element 106, collector/validator 104 may spend at least 95% of that time requesting the performance metrics. In small networks, overhead associated with a relatively small number of network elements 106 may be negligible. However, in a large network, for example, a network with at least approximately 10,000 nodes, the above-mentioned problems make it necessary to include a number of collector/validators 104 in a network. A more efficient method of collecting performance statistics is needed to decrease the impact of an inability to contact network elements 106 and configuration changes and to decrease the amount of resources, for example, a number of collector/validators 104, needed to collect the performance metrics from network elements 106 in a large network.