1. Field of the Invention
The present invention is directed to a system for monitoring and recovery of software based application processes. More particularly, a system for automatically restarting software applications and providing failure notification for automated business processes, and providing tracing of performance and availability of the applications, and service level management.
2. Description of Related Art
The popularity of the Internet fueled a great demand in business-to-business and business-to-consumer Internet applications. Many organizations have established Web-based distributed applications for dissemination or collection of information, or to extend remote access and command capabilities of processes through Web-based interfaces. For example, a merchant's web system allows consumers to purchase items online, and pay with a credit card. Credit card transactions are processed by communication with an outside system belonging to a third party.
The tremendous expansion of the Internet has also changed the paradigm of implementation and deployment of software applications and expanded the number and features in software applications. For example, Application Service Providers (ASPs) operate and maintain software application at remote web sites and as part of their product offerings, make those software applications available to users via the Internet.
For distributed network systems, several protocols exist in which one computer system (a “host” system) receives and processes messages from a number of other computer systems (“client” systems). In the example of the World Wide Web (“WWW”), in the simplest network configuration, one server would be the host system while each personal computer would be a client system. If a web site is very popular, or otherwise has large volume of traffic, the host operations may fail due to a system overload. To address this problem, load directions or load balancing techniques have been developed, by which several servers are arranged in parallel, and arrangements implemented for distributing work among them. Distribution of work, where a received message is allocated to a particular host computer, is often referred to as load directions or load balancing.
Other prior art systems remotely monitor network systems to provide failure notification based on certain triggering events, or to avoid system overload. However, developers working on building software systems have limited access into the performance of various components, and must often rely on sorting through log files or devising tests to determine various levels of functionality. Further, when the system is “live” and in use, most of the currently available monitoring tools only report on basic hardware performance metrics such as CPU usage, or the receipt of a response from “ping-ing” a port on a given machine. There is no way to monitor the level of performance of the actual business logic of an application, or the details of interactions with other external applications. None of the prior art systems perform failure notification and automatic recovery based on logical evaluation of the monitoring data they receive.
Accordingly, there is a need for a system and method for remotely monitoring the network, which avoid these and other problems of known systems and methods.