The present disclosed method relates to techniques for evaluating the performance and/or reliability of communications networks, or of any system that can be modeled as a communications network.
As communications networks have grown in size and complexity, the evaluation of their performance and reliability has become more critical. Network service providers usually guarantee a certain level of service, in terms of down time, restoration delay, packet delay, etc., to their enterprise customers. Violating these agreements results in a penalty to the service providers. In turn, the enterprise customers incur business losses if their networks do not perform as expected. There are even some legal requirements on U.S. businesses to be aware of their network's “risk profile” (e.g., the US Sarbanes-Oxley act of 2002).
Some aspects of the service level have to do purely with reliability, such as downtime, while others are pure performance measures, such as packet delay. Each can be evaluated or predicted in isolation by an appropriate model and technique, but as networks are becoming larger and more complex, it is becoming less realistic to consider only pure reliability measures or to evaluate performance measures as if the network were always in its perfect (no failure) state. Even though the complexity of real networks often requires that performance and reliability evaluations be treated separately from each other, a combined, simultaneous evaluation, known as performability analysis, is very informative.
Performability analysis consists in characterizing failures probabilistically and evaluating a performance measure (or measures) over a large set of network states. There is one “perfect” network state in which all of the network components are assumed to be operating properly. Every other network state is characterized by the failure of one or more of the network components. Among the various network states that would be evaluated are a) states in which a particular component is assumed to have failed (in combination with the failure of zero or more other components) and b) other states in which that particular component is assumed to have not failed. The probability of various kinds of components failing is known, based on experience, information from manufacturers, and statistical sampling techniques, and that probability is a factor in the probability of the occurrence of those states in which the component in question is assumed to have failed and ultimately, then, in computing the overall performability characteristic in question.
An individual network component may actually have more than one failure mode. For example, a link that normally has bandwidth X may fail in such a way that the available bandwidth is X/2, X/3, etc. Another failure mode is one in which the link has failed completely and thus no bandwidth is available. Any one or these possibilities can occur in some network states. A typical performance measure is packet delay, as mentioned above. Another is the percentage of offered traffic that cannot be delivered due to the particular failures represented by the network state, referred to herein as “percent traffic lost.” Such performance measures are computed based on an assumed traffic matrix, i.e., an assumed amount of traffic demand between each origin/destination node pair in the network and, in a typical application, are computed after a network restoration algorithm has been applied to the network in the assumed state in order to reroute as much of the traffic as possible.
The performance measure value computed for each network state is then multiplied by the probability of occurrence of that state (which is, in turn, a function, of the probability of failure of the components assumed to have failed for that state) and the resultant products are used to derive a performability characteristic. A simple performability characteristic is what is called the “expectation” of a particular performance measure, given by the sum of the products just described. A more sophisticated performability characteristic that can be derived from this analysis is a performability guarantee of the form: “with X % probability, at most Y % of the network traffic will have no path.”
A whole set of performability characteristics can be developed in this way by considering multiple performance measures for each network state.
For each network state, there is a characterization of the network at the component level. In order to carry out the analysis just described, a logical level characterization is generated for each network state. The logical level characterization characterizes the network in terms of nodes and edges that interconnect them. The nodes at the logical level correspond to packet routers or other similar components, whereas the edges represent routes for traffic through the network which may traverse one or more physical links. A failure of one or more components at the component level—including optical fiber spans, routers, repeaters or other physical components—may affect the capacity of one or more edges at the logical level, capacity being a measure of the amount of traffic that can traverse an edge as reflected by, for example, the bandwidth of the edge or the bit rate or packet rate that it can support. Given an assumed traffic matrix as mentioned above, one can then use known network analysis techniques to compute the desired performance measure—such as, again, packet delay or percent traffic lost—after any restoration algorithm used in the network has been run on to the network state under investigation.
One can then compute desired performability characteristic(s) such as those mentioned above, i.e., the performance measure's expectation or some performance guarantee.