1. Field of the Invention
The invention relates to computing systems and, more particularly, to detecting problems in network communications.
2. Description of the Related Art
In recent years, organizations have become increasingly dependent upon the proper operation of both computer hardware and software. Many business organizations and governmental entities rely upon applications that access large amounts of data, often exceeding a terabyte or more of data, for mission-critical applications. Many of these applications require near-continuous access to data. For example, many systems such as retail processing databases, airline reservation databases, and financial institution databases must be available to perform transaction processing 24 hours a day. Because of their dependence upon the proper operation of these applications, downtime associated with such systems can be disastrous. Accordingly, it is critical that the cause of an application failure or a system aberration be quickly identified and remedied.
While efforts may be made to quickly identify problems with a particular application, diagnosing the causes of such problems can be very difficult. Typically, a system administrator may be required to study logs to determine the past history and generally engage in a root cause analysis (RCA). However, as may be appreciated, such an approach may be very time consuming and relatively inefficient.
Because organizations typically utilize a wide variety of applications and components which are provided by third parties, such organizations frequently depend upon support from those third parties. Third party vendors frequently have support personnel who may be reached via telephone who are trained to assist customers with problems. While such customer support may serve as a benefit to customers, vendor support personnel may find their time is consumed by matters which do not involve the vendor's products.
Often times it is difficult to ascertain where a problem within a system originates. Consequently, what may appear to be a problem with a particular product, turns out to be a problem with another product or component elsewhere in the system. For example, in a system which includes network communication, a customer may believe there is a problem with a locally installed application when in fact there is a problem with the network communications. However, believing it to be a problem with the particular application, the customer may call the vendor of the application in order to obtain technical support. The vendor support personnel may then spend a significant amount of time trying to determine the cause of the customer's problem. After some time it may be determined the cause of the problem is not with the vendor product and the customer should seek support elsewhere. Unfortunately, time and money has already been spent by the vendor on a problem which is unrelated to their product. In addition, resolution of the customer's problem has been delayed as well.
In addition, in analyzing network performance, traditional approaches typically rely on network management or diagnostic tools which take measurements with passive or active network elements. For example, observations of existing connections may be made from the middle of the network, “sniffing” may be used to measure the health of a particular connection, or dedicated diagnostic tests may be run end-to-end. However, each of these approaches may impact system performance, may require modification of operating system or application code, or are otherwise unsuitable for use in real-time support situations.
Accordingly, an effective method and mechanism for diagnosing system problems is desired.