1. Field of the Invention
The present invention generally relates to a problem determination method, system and program product. Specifically, the present invention allows problem determination probes to be inserted into program classes of a running object-oriented runtime environment under the direction of a dynamic work flow derived from a collection of on-line knowledge bases.
2. Background Art
In the production of software, problem determination is the process of identifying the cause of either a system failure or the cause of a system not behaving as expected. Typically, problem determination results in the finding of a configuration error, an improper use of an application programming interface, a product defect, or some other root cause. There have been numerous advances in problem determination for the situation where the software is still being run on a platform under the control of the software producer. In these environments there are many successful approaches for diagnosing failure, and the approaches typically rely on the use of test cases and debugging tools to isolate problems. This type of problem determination is commonly known as “debugging.”
Unfortunately, very little progress has been made in problem determination when a shipped software product has been installed in a production environment at a customer's site. Problem determination in this situation (e.g., known as troubleshooting) becomes especially difficult when a failure occurs in a customer business process that involves multiple products. This difficulty exists even if several of the products come from the same software provider. In this environment, the customer's personnel (e.g., an administrator) with access to the failing production platform (which may be multiple computers in a network running various inter-working software products) generally attempt to address the failure. However, such administrators traditionally have a poor communications channel with the product support personnel as well as an ill-defined process for reaching a successful conclusion of eliminating the failure.
One relevant characteristic in “troubleshooting” is the fact that a majority of product support requests from customers are resolved without identifying a product defect. In actuality, failures more often result from misleading documentation, improper configuration, improper installation, unidentified dependencies, or the flow of work between products. Another relevant characteristic is the disjointed flow of diagnostic information between the suspected failing component and the respective service personnel by way of the customer administrator. Specifically, “troubleshooting” is often accompanied by several rounds of “telephone tag” intermixed with overnight shipments of large traces and dumps of data. Moreover, the service personnel are typically limited in their response to using the problem determination capabilities built into the product by its development team. These capabilities consist of the product development support originally built into the product, which often are limited to trace levels of support with a few levels of generic controls. This can lead to generation of large volumes of output and consumption of so many resources that the customer must schedule capture of the requested information in non-prime time hours.
In the past, some attempts have been made, particularly at the hardware/micro code level, to directly connect a failing machine with its manufacturer's service personnel. This arrangement has met with much resistance from customers who view this capability as a security problem. Specifically, many customers are concerned that they are unable to control the flow of information and fear that business information may be unnecessarily disclosed during the diagnostic process. To this extent, it is not unusual for today's product service personnel to receive dumps/traces of data from a customer's administrator in printed format with certain contained business information blacked out, or to find that the information was generated on a non-production system using non-confidential test data. Such activities not only lengthen the resolution, but also often mask the problem.
In view of the foregoing, there exists a need for an improved problem determination method, system and program product. To this extent, a need exists for an automated “troubleshooting” process that smooths the flow of diagnostic information and allows the knowledge accumulated by the product service group from previous support engagements to be used in the automation scheme. A further need exists for such a problem determination scheme to be implemented while the subject computer system(s) remains running.