Computer networks have grown increasingly complex with the use of distributed client/server applications, mixed platforms and multiple protocols all on a single physical backbone. The control of traffic on networks is likewise moving from centralized information systems departments to distributed work groups. The growing utilization of computer networks is not only causing a move to new, high speed technologies, but is, at the same time, making the operation of computer networks more critical to day to day business operations. Furthermore, as computer systems become more distributed, the operation of related systems may be controlled by different entities. For example, web hosting or web applications may be provided for companies by a web hosting or web application service company.
Service level agreements (SLAs) are becoming increasingly common in networks, such as networks supporting Internet protocol (IP) communications. SLAs may include one or more service level objectives (SLOs) which specify measurable criteria against which the performance of a resource is compared to determine if the criteria is met. A resource may be any component in an information technology infrastructure, the performance of which may be measured against a criteria. For example, a server, a router, an application on a server, a network connection and the like all may be resources.
As an example, a service level agreement (SLA) with a customer of a web hosting company may include one or more service level objectives (SLOs) that specify availability and/or performance criteria that are to be met by the web hosting company. For example, an SLO may specify that a resource, such as a particular website or application, is available for 99.9% of each month. SLOs of the SLA could also specify a minimum throughput, number of transactions supported or response time of the resource or a different resource. The resource's availability is monitored and, if the resource is not available 99.9% of a month, a violation of the SLO has occurred. Similarly, if the required throughput, number of transactions or response time is not met for the resource, a violation of the corresponding SLO of the SLA could also be noted.
One difficulty presented in monitoring compliance with SLAs is that a common resource may be related to multiple SLAs from one or more customers. As the number of SLAs increases, the complexity of monitoring and/or assuring SLA compliance may also increase. For example, if a series of servers hosting multiple applications for different customers go down, the unavailability of the servers may be monitored against different SLOs of the respective SLAs of the customers. The availability criteria may be different for the different customers. However, each SLO violation may generate a notification that the violation has occurred, each of which may be viewed as the same priority to a network or system administrator receiving the notifications. Furthermore, the notifications are typically generated after failure to meet the SLO. Thus, there may be no ability to resolve the problem in a manner that would avoid non-compliance with the SLA.