Information technology (“IT”) professionals are increasingly being requested to demonstrate the level of availability of the computing resources they manage. For instance, an IT manager may be requested by company management to demonstrate the level of availability of the company's mail servers, file stores, world wide web (“WWW” or “web”) servers, gateway servers, application programs, or other computing resources. The level of availability for a computing resource refers to the time during each day, or other period of time, that the computing resource is operating and available for use.
The importance of being able to demonstrate the level of availability for computing resources is becoming more important for a variety of reasons. For one, computing resources now more than ever are expected to be readily available to users. For this reason, IT managers are being asked with greater regularity to achieve availability of the computing resources they manage 99.999% of the time (this is referred to in the IT industry as achieving “five 9's”). Without accurate statistics regarding the level of availability being achieved, it is difficult for an IT manager to achieve five 9's.
Another reason IT managers are being asked to demonstrate the level of availability for the systems they manage stems from the increased popularity of electronic mail (“e-mail”) and messaging service hosting providers. Service hosting providers own and manage the computing resources necessary to provide a computing service to users, such as e-mail, and charge users for the provision of the service. As the customers of hosting providers become more sophisticated, they are more commonly interested in having detailed information regarding the level of service they are receiving from their provider. This information may be used to set service level requirements in a service level agreement (“SLA”) between the hosting provider and the customer, and to determine whether the specified service levels are actually being met. Additionally, some customers want to include financial penalties in the SLA for the provider's failure to achieve specified availability levels. Because it is currently difficult for service providers to generate the necessary system availability level metrics, the inclusion of these metrics in an SLA and the prospect of financial penalties for failing to meet the metrics are challenging propositions for both the hosting service provider and the customer. In many cases, decisions are being made in this regard based upon a perception of system availability level rather than on actual data.
In the past, system level availability metrics have generally been calculated manually using spreadsheet application programs, custom-built spreadsheets, and information from various non-standard sources. While calculating these metrics manually can provide some useful information regarding system availability, calculating availability metrics in this manner suffers from a number of potentially serious drawbacks. For instance, availability calculations are often custom created within each organization without a standard mechanism for deriving system level availability. As a result, it is frequently difficult to understand whether the calculations are correct, to understand exactly what the calculated results mean, and to meaningfully compare availability calculations generated within different organizations. Additionally, manually calculating availability metrics can be a time intensive task literally taking hours each month. This can be expensive and particularly frustrating for a time-strapped IT manager. Moreover, the manual calculation of the availability metrics is likely to generate incorrect results. Incorrect system availability level metrics can result in erroneous and inconsistent reporting, incorrect data for setting service levels, penalties for failing to meet the service levels specified in an SLA, resources being allocated to incorrect areas, and poor perception of system performance, among other problems.
It is with respect to these considerations and others that aspects of a computing system for determining the availability of a computing resource are described below.