When any computer network is put into service, the network operator and the network users have their own expectations as to the level of performance to be provided by the network. Where the network operator and the network users work for the same organization, the expectations may be formalized in written memoranda or may exist only in the minds of the network users and (hopefully) the network operator.
Where the network operator and the network users work for different organizations, the expectations may be formalized in a service level agreement. A service level agreement or SLA is an agreement or contract between a service provider, the network operator, and a customer, the network user. Under a service level agreement, the customer pays a service fee in return for an assurance that it will receive network service that conforms to requirements defined by the service level agreement. If the service provider then fails to provide the agreed-to service, it ordinarily becomes subject to penalties under the agreement, such as being required to rebate at least some previously received service fees or being required to reduce fees due for future services.
While an almost infinite variety of service level agreements, both technical and non-technical in nature, are possible, the present invention generally relates to the management of network performance where performance requirements have been defined, either informally or in formal service level agreements.
Network performance requirements, whether formal or informal, should reflect the type of network service being provided and the customer's specific requirements when it uses that service. A customer with high reliability requirements may, for example, expect or even obligate the service provider to keep the network in operation for no less than a specified percentage of time. Similarly, a customer for whom network response time is critical may expect or obligate the service provider to maintain average network transit times on critical routes at or below a defined threshold.
To verify that transit time requirements are being met, the service provider can regularly have a source network station “ping” (query) a destination network station to determine round trip transit time; that is, how long it takes for the query to reach the destination and for an acknowledgment to be returned from the destination to the source.
The actual performance of the system is usually monitored by a network management application which generates a message or alert when a performance violation occurs. That alert is sent at least to the service provider to enable the service provider to take steps to restore conforming network operation. This approach, while common, has significant drawbacks for both the network user and the service provider. From the network user's perspective, the performance violation may have already caused disruptions of significant tasks or processes by the time the network user first learns of it. Even if the service provider responds promptly to a violation alert, the recovery time or time required to return to conforming network operation is necessarily prolonged since the service provider can't begin to fix a problem until the problem is known to exist. From the service provider's perspective, the service provider may already be subject to penalties under an existing service level agreement by the time it first learns of the penalty-inducing violation. Even where no formal service level agreement exists, the service provider can expect to lose customer good will for having failed to live up to the customer's expectations.