As enterprises accelerate their migration from traditional circuit switched telephony services to Internet Protocol (IP) Telephony solutions, a major consideration is their ongoing concern as to the potential reliability of the proposed IP voice services versus that of their current infrastructure. Indeed, in many call center environments, the potential cost of downtime is often greater than the implied benefits of migrating to IP Telephony. As such, it is often crucial for an enterprise system to meet a certain level of availability (e.g., 99.99% available) for the entire enterprise system or for at least one or more site(s) or subsystem(s) of the system.
The availability of a switching system is traditionally defined as the probability (i.e., percentage of up time) of the time that the system is operational. The availability is generally calculated from an end-user's perspective and does not necessarily reflect the frequency of individual component failures or required maintenance where they do not affect the availability of the overall system to an end-user. The telecommunications industry requires a high degree of rigor, structure and methodologies for determining whether a device or service is operational. The availability of critical components (i.e., critical to call processing) of an IP Telephony system, such as supporting hardware, software and the underlying data network infrastructure is analyzed first. Then, the total system availability is calculated based on the availability of all the components.
The Telecordia (Bellcore) GR-512 Reliability Model requirements for telecommunications equipment, for example, provide one industry standard for determining critical outages and downtime. In this model, the data required for predicting system availability is limited to unplanned outage frequency and downtime experienced by service interruption. Potential outages include Reportable Outages, Outage Downtime Performance Measure and Downtime Measure for Partial Outages. A Reportable Outage comprises an event that includes total loss of origination and termination capability in all switch terminations for at least a 30 second period (uninterrupted duration). An Outage Downtime Performance Measure comprises “the expected long-term average sum, over one operating year, of the time durations of events that prevent a user from requesting or receiving services. A failure that causes service interruption contributes to the Outage Downtime of that service. Outage Downtime is usually expressed in terms of minutes of outage per year.” A Downtime Measure for Partial Outages is a Weighted Downtime Performance Measure. “The actual time duration of a partial outage is weighted by the fraction of switch terminations affected by the outage condition.”
Thus, the availability of a critical component or subsystem (i.e., critical to call processing) is typically described by the following formula:Availability=(MTBF−MTTR)/MTBFwhere MTBF represents a Mean Time Between Failure and MTTR represents Mean Time To Recovery/Repair, which corresponds to the time to diagnose, respond and restore service. This equation is also presented in industry literature as the following:Availability=(MTTF)/(MTTF+MTTR)where MTTF is defined as a Mean Time to Failure, and equates to (MTBF−MTTR).
Using these formulas, the estimated average annual minutes of downtime experienced due to a critical component or a subsystem failure can be expressed as the following:Annual Downtime Minutes=(1−Availability)×(525960 minutes/year),where the 525960 minutes per year is based upon assuming 365.25 days per year.
For projecting an enterprise's total system availability, the sum of the annual downtime from each of the subsystems or individual critical components (i.e., those components critical to call processing) is calculated, and the system availability is estimated by this sum. Thus, the Total System Availability can be estimated by the following formula:
      Total    ⁢                              ⁢                            ⁢    System    ⁢                  ⁢    Availability    =      1    -                  ∑                  [                                    i              ⁢              th                        ⁢                                                  ⁢            Subsystem            ⁢                                                  ⁢            Annual            ⁢                                                  ⁢            Minute            ⁢                                                  ⁢            Downtime                    ]                            525960        ⁢                                  ⁢        Minutes        ⁢                  /                ⁢        Year            
Where downtime affects only a portion of an enterprise's system, the downtime is weighted due to the portion of the system that is affected by the outage. As described above, the calculation of the Downtime Measure for Partial Outages involves weighting the actual time duration of the outage by the fraction of the total switch terminations affected by the outage condition. Thus, this calculation assumes an equal distribution of traffic across the enterprise system's switch terminations. In reality, however, traffic patterns in a telecommunications system can vary widely. A call center, for example, may handle traffic levels orders of magnitude higher than the traffic of another site, such as a branch site of the telecommunications system. In such a network, the assumption of an equal distribution of traffic fails to accurately represent the actual distribution of traffic on the system and thus fails to accurately assess the system availability.
Thus, in predicting the availability of a telecommunications system, it would be desirable to account for the traffic distribution across that system.