Recently, various fundamental computer application programs such as word processing, database programs, intranet email and other application programs have been converted for offer and use on-line service platforms. For instance, Microsoft® Office 2003 provides several “suite” services that have been traditionally offered as fixed media software through an Internet-accessible web site. On-line services or applications are hosted at a server-side central data center which is communicatively linked to remote client-side terminals. Principal goals of this on-line service include presenting, sorting and distributing documents universally across a network of client terminals and servers with 24 hour, 7 day/week accessibility.
The quality and stability of on-line service software requires a different level of quality and stability from failure than previous client-side software. Instead of a failure affecting the quality of service on one machine or on one network only, a failure of an on-line service may impose catastrophic effects on users on a widespread level. Exponential numbers of individual or business customers may be adversely affected by a single software-based problem. As such, the reputation and goodwill of the software vendor is at stake, in addition to future business revenues. Also at stake are potentially lost financial resources resulting from haphazard debugging efforts and lost development time for future projects.
Given the critical need for on-line service reliability, there are numerous event generation and logging mechanisms available to developers. For instance, Microsoft® Windows NT® provides the NT event log for monitoring selected events. Another event monitor is the Microsoft® Windows NT® PerfMon counter. These event generation and logging mechanisms are operational to monitor events, report deficiencies and enhance debugging capabilities for on-line service code. Although NT event log and PerfMon are useful by themselves, they are limited in terms of log storage capacity. Storage limitations will affect the types of events that can be captured and monitored for troubleshooting.
The different design considerations, security levels and technologies that may be used to create an “integrated” on-line service serve to create discrepancies in the quality and depth of instrumentation capabilities within and between interrelated software code. Developers may be confused as to what instrumentation technologies to use and may, therefore, instrument their software code in an inconsistent manner or, in some cases, not at all. Ultimately, the gaps in monitoring, maintenance and debugging capabilities make providing a consistently reliable on-line service difficult to implement.
Therefore, consistent, in depth instrumentation is critical for a commercially practical on-line service. Developers should be able to instrument integrated software code for an on-line service by using one unified logging service. Consistency, at least in basic logging standards will encourage uniformity in the use of instrumentation and eliminate confusion about what kind of events to log and how to log them. The instrumentation should have minimal effect on substantive software code in terms of overhead and should be universal enough to replace all existing logging and tracing technologies. To accomplish this, the instrumentation has to be universal enough to support event generation from a variety of code sources, event levels and event categories. In addition, the instrumentation has to meet the monitoring and reporting needs of the technical support developers who maintain the operation of on-line services.
For instance, technical support to provide real time monitoring of events is required to maintain a commercially viable quality of service for an on-line service. Real-time monitoring would notify operators almost immediately when a problem occurs for a first line of rapid troubleshooting. In order to implement a real-time monitoring system, outputted events would have to be analyzed and prioritized based on time critical diagnosis criteria. A rule based monitoring method that is operational to watch for the occurrences of events in a certain time interval according to specified thresholds is one way to distinguish between time critical events for real time monitoring and less critical events.
There is a need, therefore, in the industry for a system, including apparatuses and methods, for instrumenting on-line service software code to generate events, for monitoring generated events, for alerting appropriate personnel upon the occurrence of certain generated events, and for logging generated events for subsequent use in troubleshooting and debugging on-line service software code.