Capacity planning is an important function in designing and provisioning communication networks. While network link and node capacities have been estimated for years, there has been relatively little study of availability, especially for large mesh networks. Large mesh networks with multiple nodes and links, and with arbitrary topology, are not very amenable to an exact analysis, especially for multiple failures. The multiple failure case means that, in a typically large span of control, by the time another failure occurs, repair processes for at least one previous failure have not completed, so that there may be more than one failure to deal with at any one time. Simple structured point-to-point or ring networks, for example, may have 1—1 or ring protection mechanisms for single failures, e.g., a single fiber cut at a time. The single failure case means that, in a typically small span of control, by the time a second failure occurs, repair processes for the first failure have completed, so that there is no more than one failure to deal with at any one time. In typically route or geographically constrained networks of this kind, analytical and approximate techniques can give insight and understanding of service availability for each of any possible single failures. If, however, the network is unstructured like a mesh, if the number of nodes is large, and if multiple failures are considered, the calculations, even if approximate, quickly become very complicated.
An article entitled “Computational and Design Studies on the Unavailability of Mesh-restorable Networks” by Matthieu Cloqueuer and Wayne D. Grover on Proceedings of DRCN '2000, April 2000, Munich describes computational techniques of unavailability of a mesh network for single and multiple (mainly two) failures
As mentioned in the above article, network availability generally refers to the availability of specific paths (also called connections) and not that of a whole network. Networks as a whole are never entirely up nor entirely down. “Network availability” can be defined as the average availability of all connections in a network but this gives less insight and comparative value than working with individual paths, or perhaps a selection of characteristic reference paths. Therefore, service availability between source and sink nodes is more meaningful to communications users who pay for such services.
For a quantitative study of network availability, FIG. 1 illustrates service on a specific path as down (unavailable) in durations U1, U2, U3, . . . Un along the time axis. On the vertical axis (U=unavailability), ‘u’ indicates the service as unavailable, and ‘a’ as available. Service availability over a period T is the fraction of this period during which the service is up. Therefore, service availability and unavailability are defined as follows:Availability=lim {(T−ΣUi)/T}=MTTF/(MTTR+MTTF)Unavailability=1−Availability=MTTR/(MTTR+MTTF)Where, MTTR is the mean time to recover or repair, and MTTF is the mean time to failure. Recovery is by relatively fast means of network protection (in tens of milliseconds) or restoration (perhaps within a second) capabilities, whereas repair is much longer (typically hours).
The above referenced article discusses computational approaches for analyzing availability under a two-failure scenario. Such approaches are quite complex.
There is need for faster and easier techniques to determine service availability, especially in large mesh networks. Simulation provides tractability for large networks, and is also a good check on the accuracy of simple, approximate or analytical methods. Thus, the time simulation technique is a relatively easier and faster process that complements more insightful analytical approaches to availability.