A variety of systems have evolved for accommodating software objects in a variety of information processing scenarios. For example, a server application running software objects on a host or server computer in a distributed network can provide services or functions for client applications running on terminal or workstation computers of the network which are operated by a multitude of users. Common examples of such server applications include software for processing class registrations at a university, travel reservations, money transfers at a bank, and sales at a retail business. In these examples, the processing services provided by the server application may update databases of class schedules, hotel reservations, account balances, product shipments, payments, or inventory for actions initiated by the individual users at their respective stations. A common way to implement these applications is by exchanging data through a web site hosted on the server.
As organizations become more dependent on their information systems, successful business operation is increasingly tied to application software availability. Thus, certain applications need to be available at all times; any interruption in service results in lost customers or missed deadlines. Applications playing an integral part in business operations are sometimes called “mission critical” or “24×7” applications. For example, if an order center is open twenty-four hours a day to accept customer information requests and orders, inferior performance or failure at any time impairs business operation. To avoid service interruption, an organization assigns the task of monitoring application performance and availability to a team of information technology professionals known as system administrators.
The system administrators strive to ensure the server applications provide consistent, quality service. However, maintaining service is an ongoing battle against a variety of factors. Inevitably, an application becomes overloaded with requests for service, or software anomalies crash the application altogether, leading to inferior or interrupted performance and loss of mission critical functions. If the system administrators wait for customer complaints before taking action, some users have already experienced poor service. Also, if the system administrators wait until a server fails completely (or “crashes”), they must expend considerable time and effort to restore service. And, as the number of applications and servers grows into an enterprise-wide system, inferior performance may go unnoticed. Finally, the system administrators typically find themselves chasing down urgent failures rather than focusing on improving application performance. Ideally, then, system administrators should monitor application performance to avoid problems instead of reacting to user complaints.
To achieve this end, system administrators turn to management software, to provide an indication of how each system is performing and whether the system has failed. In this way, the system administrators avoid service outages and can see a particular system needs attention because performance is degrading.
Two techniques for gathering information about a system's operation (sometimes called “operational management information”) have developed for management software: non-intrusive and intrusive. Non-intrusive techniques require little or no modification to existing applications but provide limited information. For example, non-instrusive management software may monitor free disk space or sniff network packets. Additional features include an alert system; the system administrator can specify criteria (e.g., disk free space falls to under 1 percent) that will trigger an alert (e.g., page the administrator). However, non-intrusive techniques are of limited use because they typically monitor the underlying system rather than a particular application. Thus, a non-intrusive technique typically cannot pinpoint what application functionality is causing trouble. For example, in the above example, the alert does not explain why the disk usage has increased or which application is responsible for the increase.
Intrusive techniques offer additional information not provided by non-intrusive techniques. In one intrusive technique, a process called instrumentation is applied to each application. To instrument an application, programming instructions are added throughout the application to send information to management software. The instructions may relay information indicating a location within the application, allowing the management software to determine what portions of the application are responsible for generating error conditions or triggering alarms.
For example, code could be placed in a customer order application to send a notification to the management software when a customer order is received and another notification when processing for the order is completed. In this way, the management software can provide information about the number of orders received and the number of orders completed per minute. If the number of orders completed per minute drops to zero while the number of orders received per minute remains constant, it is likely that some portion of the system has failed; further it appears the problem is with processing orders, not receiving them. Thus, an alarm set to inform the administrator when the orders completed rate drops below 20% of the orders received rate indicates both that there is a problem and that the administrator should investigate why orders are not being completed.
However, intrusive management techniques suffer from various problems. First, the instrumentation process requires an application developer to undergo the process of including extra code at development time or retrofitting a current application with instrumentation code. And, during the instrumentation process, the developer must determine how much instrumentation is sufficient. There are numerous degrees of instrumentation, and it is not always clear at application development time how much instrumentation is desired. Excess instrumentation can degrade performance, but too little might not provide sufficient information to adequately manage the application. If the wrong decisions are made, the application must be modified yet again.
Thus, instrumentation requires exercise of seasoned judgment and care on the part of the application developer, who may consult with the system administrators to incorporate their experience into the instrumentation process. As a result, instrumentation requires expertise in high demand, and the process drains resources from the primary tasks of developing, improving, and maintaining the application. In addition, since instrumentation itself can introduce new problems, the instrumented version of the software must be tested to detect newly introduced software bugs.
Second, instrumentation can be implemented according to one of a variety of instrumentation standards, and an application instrumented according to one standard may not work with management software expecting a different instrumentation standard. Thus, if two departments using different standards are combined, two different application management systems must be run in parallel unless the software is re-instrumented.
Thus, system administrators are forced to choose between a non-intrusive monitoring technique which provides no information at the application level and instrumentation, which requires an experienced software developer who modifies an application to accommodate specific management software.