1. Field of the Invention
The invention relates generally to error recovery procedures in managed devices and more specifically relates to automated methods and systems to apply probabilistic calculations to dynamically select error recovery and diagnostic procedures for resolving error conditions in a managed device.
2. Description of Related Art
As computing systems of various types have grown in complexity, so to has grown the complexity of diagnosing problems with such complex computing systems. Even users of home personal computer systems are familiar with the frustrations of trying to diagnose difficult problems on such relatively simple home systems. Such frustrations are magnified many fold in larger, more complex computing systems and large peripheral subsystems.
In general, diagnosing such problems entails selecting from among a plurality of diagnostic and test procedures (also referred to herein as recovery procedures) to determine the nature and underlying cause of an error condition and to suggest an appropriate solution thereto. Manual procedures for such diagnostics involve a human reviewing such a plurality of diagnostic procedures based upon their observation of the nature of problem. The user chooses a first option from such a plurality of diagnostic procedures and based upon the results of performing this procedure determines whether other procedures are necessary and appropriate. The process then continues in an iterative manner until the error condition is eventually resolved.
In general, the device in which an error condition has been sensed is referred to herein as a managed device. A management device is a device coupled to the managed device on which a management client process interacts with a user to help resolve the recognized error condition. It is also generally known that such a management client process interfaces with a user to direct diagnostic procedures in conjunction with a corresponding management server operable within the managed device.
The managed device may be, for example, another computer system in which an error condition requires further diagnosis or may be any of a variety of devices including computer peripheral storage subsystems such as disk array storage systems.
It is generally known in such computing systems to provide automated assistance to a user in performing such diagnostic error recovery procedures. Presently known automated systems are generally one of two types. Help systems provide suggestions and prompts in response to user input describing the nature and symptoms of the error condition. Exemplary of such help systems are the help systems associated with personal computing products such as Microsoft Windows and Microsoft Office. The user enters a textual explanation of the problem including one or more keywords identifying the nature and symptoms of the problem. The help system searches its database for potentially relevant information and presents a list of such potentially relevant information to the user. The user then selects from the list information that appears to be most relevant and attempts to correct in the problem using recovery procedures in the supplied information. If the supplied information successfully corrects the error condition, the user proceeds with normal operation. If not, the user may select another recovery procedure from the list of potentially relevant information in hopes of further diagnosing and resolving the identified problem.
A second type of automated diagnosis system adds some degree of artificial intelligence or expert knowledge processing to help the user identify the most likely relevant recovery procedures from its database of potential diagnostic information. Such systems are often referred to as knowledge bases as distinct from mere databases. Such xe2x80x9cintelligentxe2x80x9d systems utilize a number of heuristic techniques and artificial intelligence techniques in an attempt to determine more accurately the nature of the problem from the user""s description and to thereby refine the list of potential diagnostic procedures to those most likely to resolve the user""s problem.
A first problem with all such presently known systems arises in the need for manual user interaction in describing the nature of the problem. Such manual procedures are prone to error both in terms of mis-characterizing the nature of the problem as well as mis-understanding detailed technical aspects of the problem. Present help systems and knowledge bases generally rely on the user to provide input (often in response to prompts from the help system) to describe the nature of the problem to be resolved. Such human input may misrepresent the nature of the error condition because the user fails to recognize the existence of or significance of particular aspects of the error.
A second problem with current help or diagnosis systems arises from their static nature. The databases or knowledge bases searched by present help or diagnosis systems generally are static in nature in that they are not dynamically adapted to particular states of the system being diagnosed. A recovery procedure for a particular error condition may be useful in one state of the managed device but less useful in another state of the managed device with the same error condition.
In view of the above discussion, it is evident that a need exists for improved management systems that provide additional automation to reduce human error in diagnosing error conditions. It is further desirable that improved management systems include a dynamic aspect to adapt their operation to the dynamic status of the managed device to be diagnosed.
The present invention solves the above and other problems, thereby advancing the state of the useful arts, by providing improved methods and systems for diagnosing problems in a managed device. In particular, a management client is operable to interact with the user and with a management server. The management server is operable within the managed device. The management server is operable to detect the present status of the managed device and to therefrom compute which recovery procedures are most likely to resolve the presently recognized problem in the managed device. Specifically, the server calculates a probability value associated with each possible recovery procedure. A list of recovery procedures is supplied to the user""s management client. A user then selects a recovery procedure to be performed from the listed recovery procedures with associated computed probability indices in hopes of resolving the detected error condition. A feedback mechanism provides feedback to the management server such that subsequent calculations to select recovery procedures will recognize success or failure (or partial success or failure) of the performed recovery procedure as applied to the present status and error condition of the managed device.
A first aspect of the present invention is therefore its application of probabilistic or xe2x80x9cfuzzyxe2x80x9d techniques to aid the user in diagnosing and resolving problems in a managed device.
A second aspect of the present invention is its use of dynamic information regarding the present status of the managed device for selecting among the plurality of recovery procedures. Further, this status information is applied within the managed device to determine probabilities associated with each known recovery procedure.
Still another aspect of the present invention is its use of a feedback loop to adapt future probabilistic computations in recognition of success or failure (or partial success or failure) of a particular selected recovery procedure as applied to a particular error condition in a particular state of the managed device.
It is therefore an object of the present invention to provide improved methods and associated systems for managing recovery procedures associated with a managed device.
It is another object of the present invention to provide methods and associated systems for resolving error conditions in a managed device using an iterative feedback loop process.
It is yet another object of the present invention to provide methods and associated systems for dynamically determining preferred recovery procedures for resolving error conditions in a managed device.
It is a further object of the present invention to provide methods and associated systems to apply probabilistic techniques to identify recovery procedures most likely to successfully recover from an error condition in a managed device.
It is still a further object of the present invention to provide methods and associated systems for a dynamically selecting preferred recovery procedures for resolution of a problem in a managed device by computing a probability of success for each of a plurality of recovery procedures.
The above and other objects, aspects, features, and advantages of the present invention will become apparent from the following detailed description and the attached drawings.