The present invention pertains to support of printers and pertains particularly to the automated diagnosis of printer systems using Bayesian Networks.
Currently, it is highly expensive for printer manufacturers to diagnose the systems of their customers. Typically, a customer calls a printer call agent at the manufacturer. This call agent guides the customer through a troubleshooting sequence that leads to resolution of the problem or identification of the cause. This method requires the intervention of a call agent which results in a high cost.
When using call agents the printer manufacturer hires many call-agents which the customer in turn can call when he experiences problems with his printer system. The call-agent attempts to gather as much information as possible by interviewing the customer over the phone. When he reaches the conclusion, he will either have solved the problem, identified the cause, or had to dispatch a field agent that will attempt to resolve the problem at the customer site.
One drawback of using call-agents is the expense. In addition, there can be problems with consistency in the order and types of troubleshooting steps used by different call agents. It is a problem if customers are not given approximately the same troubleshooting steps in the same order with similar problems, as they may then feel confused. Also, the call agent solution allows only limited logging of information, has only limited integration of programmatic data-collectors, and very limited integration of multi-media presentations. Use of call-agents however, does provide the benefit of human-to-human communication between the call agent and the customer as the call agent will obviously be able to detect soft information that a computer-based system cannot easily detect, such as, e.g., whether the customer is irritated with some line of questioning, the level of experience of the customer, and so on.
Decision trees can be used to provide automated diagnosis of printer systems. The decision-tree approach specifies the possible troubleshooting sequences as a so-called decision tree. At each branching of the tree, one of the branches will be chosen based on the information provided by the customer at the last step. However, decision-trees are static in the sense that for practical reasons it only allows a limited number of possible sequences of the troubleshooting steps. With decision-trees all sequences that should be available to the customer have to be encoded and as the size of the decision tree is exponential in the number of these, it is only possible to encode a limited number of them. This on the average will cause the decision tree to provide longer troubleshooting sequences with lower probability of actually diagnosing the problem, as it is not possible to take all possible scenarios into account.
Case-based reasoning can also be used to provide automated diagnosis of printer systems. The case-based approach gathers a high amount of descriptive cases from troubleshooting scenarios where various problems are seen. Based on information about the current situation, the case-based reasoning engine can then select the cases that are most similar. The most similar cases are then investigated to find the best next action or question that has the highest likelihood to rule out as many cases as possible. This continues until the single case that matches most the current situation is determined.
Case-based systems gather cases that are descriptive of the troubleshooting domain and use these cases to suggest actions and questions that as quickly as possible narrows the scope down to a single case. The quality of a case-based system hinges on its case database which has to be very large to adequately describe a printer system domain. The possible configurations/cases in a printer system can be at least 21,000, if all the variables are binary. A subset of cases out of these would have to be extremely large to be sufficiently descriptive to be useful to a case-based system. Thus, it is doubtful that case-based systems will be successful in representing the printing system with its many variables to an optimal level of accuracy.
Rule-based systems can also be used to provide automated diagnosis of printer systems. Rule-based systems can be perceived as a subset of Bayesian networks, as they can be represented with Bayesian networks. They have a subset of the modeling capabilities of Bayesian networks, and the belief updating methods are not guaranteed correct as they are with Bayesian networks.
Rule-based systems, however, have updating methods that are not optimal when there are loops in the model. Loops are very common in models of real-world systems (e.g., common causes, troubleshooting steps that fixes several causes, etc.).
One troubleshooter based on Bayesian networks is described by Heckerman, D., Breese, J., and Rommelse, K. (1995), Decision-theoretic Troubleshooting, Communications of the ACM, 38:49-57 (herein xe2x80x9cHeckerman et al. 1995xe2x80x9d).
A Bayesian network is a directed acyclic graph representing the causal relationships between variables, that associates conditional probability distributions to variables given their parents. Efficient methods for exact updating of probabilities in Bayesian networks have been developed. See for example, Lauritzen, S. L., and Spiegelhalter, D. J. Local Computations with Probabilities on Graphical Structures and their Applications to Expert Systems. Journal of the Royal Statistical Society, Series B, 50(2):157-224 (1988), and Jensen, F. V., Lauritzen, S. L., and Olesen, K. G., Bayesian Updating in Causal Probabilistic Networks bu. Local Computations, Computational Statistics Quarterly, 4:269-282 (1990). Efficient methods for exact updating of probabilities in Bayesian networks have been implemented in the HUGIN expert system. See Andersen, S. K., Olesen, K. G., Jensen, F. V. and Jensen, F., HUGINxe2x80x94a Shell for Building Bayesian Belief Universes for Expert Systems, Proceedings of the Eleventh International Joint Conference on Artificial Intelligence. (1989).
Bayesian networks provide a way to model problem areas using probability theory. The Bayesian network representation of a problem can be used to provide information on a subset of variables given information on others. A Bayesian network consists of a set of variables (nodes) and a set of directed edges (connections between variables). Each variable has a set of mutually exclusive states. The variables together with the directed edges form a directed acyclic graph (DAG). For each variable v with parents w1, . . . , wn, there is defined a conditional probability table P(v|w1, . . . , wn). Obviously, if v has no parents, this table reduces to the marginal probability P(v).
Bayesian networks have been used in many application domains with uncertainty, such as medical diagnosis, pedigree analysis, planning, debt detection, bottleneck detection, etc. However, one of the major application areas has been diagnosis. Diagnosis (i.e., underlying factors that cause diseases/malfunctions that again cause symptoms) lends itself nicely to the modeling techniques of Bayesian networks,.
The currently most efficient method for exact belief updating of Bayesian networks is the junction-tree method that transforms the network into a so-called junction tree, described in Jensen, F. V., Lauritzen, S. L., and Olesen, K. G., Bayesian Updating in Causal Probabilistic Networks by Local Computations, Computational Statistics Quarterly, 4:269-282 (1990). The junction tree basically clusters the variables such that a tree is obtained (i.e., all loops are removed) and the clusters are as small as possible. In this tree, a message passing scheme can then update the beliefs of all unobserved variables given the observed variables. Exact updating of Bayesian networks is NP-hard (Cooper, G. F., The Computational Complexity of Probabilistic Inference using Bayesian Belief Networks, Artificial Intelligence, 42:393-405, (1990)), however, it is still very efficient for some classes of Bayesian networks. The network for the printing system contains several thousand variables and many loops, but can still be transformed to a junction tree with reasonably efficient belief updating.
Heckerman et al. 1995 presents a method for performing sequential troubleshooting based on Bayesian networks.
For a device to troubleshoot that has n components represented by the variables c1, . . . , cn, Heckerman et al. 1995 follow the single-fault assumption that requires that exactly one component is malfunctioning and that this component is the cause of the problem. If pi denotes the probability that component ci is abnormal given the current state of information, then ni=1pi=1 under the single-fault assumption. Each component ci has a cost of observation, denoted Coi (measured in time and/or money), and a cost of repair Cri.
Under some additional mild assumptions not reproduced here (see Heckerman et al. 1995 for more information), it can then be shown that with failure probabilities pi updated with current information, it is always optimal to observe the component that has the highest ratio pi/Coi. This is intuitive, as the ratio balances probability of failure with cost of observation and indicates the component with the highest probability of failure and the lowest cost of observation. Under the single-fault assumption, an optimal observation-repair sequence is thus given by the plan set out in Table 1 below:
In the plan described in Table 1 above, if a component is repaired in step 3, it is known from the single-fault assumption that the device must be repaired, and the troubleshooting process can be stopped. The algorithm works reasonably well if the single-fault assumption is lifted, in which case step 1 will take into account new information gained in steps 2 and 3, and step 4 will be replaced as in Table 2 below:
Heckerman et al. 1995 introduces a theory for handling a service call that is used when the expected cost of the most optimal troubleshooting sequence is higher than the cost of a service call (e.g., calling the manufacturer for assistance). The theory changes to the above plan that enables it to approximately handle systems with multiple faults and non-base observations. Non-base observations are observations on something that is not a component but potentially provides useful information for the troubleshooting process. In a companion paper (Breese, J. S. and Heckerman, D., Decision-theoretic Troubleshooting: A Framework for Repair and Experiment, Technical Report MSR-TR-96-06, (1996) Microsoft Research, Advanced Technology Division, Microsoft Corporation, Redmond, USA), the method is further advanced to also enable configuration changes in the system to provide further useful information that can potentially lower the cost of the optimal troubleshooting sequence.
However, the Bayesian-network based troubleshooters described by Heckerman et al. 1995 have a one-to-one correspondence between causes and actions which does not hold in reality, have myopic (one-step lookahead) selection of questions, and have too slow selection of questions when there are many of them. Furthermore, Heckerman et al. 1995 presents no method of knowledge acquisition for their troubleshooters.
In accordance with a preferred embodiment of the present invention knowledge acquisition is performed in preparation to troubleshoot a system. An issue to troubleshoot is identified. Causes of the issue are identified. Subcauses of the causes are identified. Troubleshooting steps are identified. Troubleshooting steps are matched to causes and subcauses. Probabilities for the causes and the subcauses identified are estimated. Probabilities for actions and questions set are estimated. Costs for actions and questions are estimated.
In the preferred embodiment of the present invention, actions and questions that require special handling are also identified. Domain experts are used to identify causes of the issue. Troubleshooting steps include actions that can solve any of the causes or subcauses, and questions that provide additional information about causes or subcauses. Each troubleshooting step that includes an action is matched to any cause or subcause the action can solve and each troubleshooting step that includes a question is matched to any cause or subcause to which the question is related.
When estimating costs for actions and questions, a determination is made for each related cause or related subcause as to whether performing a first action correctly will solve the issue. In addition a determination is made as to the likelihood a customer will perform the first action correctly. The costs for the first action can include factors that take into account the time to perform the first action, the risk of breaking something when performing the first action, the amount of money required to purchase any parts necessary to perform the first action, and the degree of insult a user may experience when the first action is suggested.
The troubleshooter utilizes Bayesian networks. For example, a Bayesian network that models a system component causing failure of a system includes an indicator node, a plurality of cause nodes, and a first plurality of troubleshooting nodes. The indicator node has a state that indicates whether the system component is causing a failure. Each cause node represents a cause of the system component producing a failure. Each troubleshooting node represents a troubleshooting step. Each troubleshooting step suggests an action to remedy causes represented by any cause nodes to which the troubleshooting node is coupled. A causes node represents a probability distribution over causes for failure of the system component.
The Bayesian network can additionally include question nodes. A question node represents a question, which when answered, provides potential information about causes represented by any cause nodes to which the question node is coupled. A question node can also represent a question of a general type which is not necessarily related to a symptom or a cause of failure of the system component.
For a first troubleshooting step suggesting a first action, when calculating whether the first action will solve a first cause, an inaccuracy factor can be utilized. The inaccuracy factor represents a probability that a user will incorrectly perform the first action. In the preferred embodiment, at least two cause nodes, from the plurality of cause nodes, can have a common subcause.
For example, the troubleshooter is a printing diagnosis system which utilizes five Bayesian networks. A first Bayesian network handles all errors where a customer does not get output from a printer when attempting to print, and where the customer gets corrupted output. A second Bayesian network handles all errors where the customer gets unexpected output. A third Bayesian network handles all types of errorcodes that can be seen on a control panel of the printer. A fourth Bayesian network handles miscellaneous erroneous behavior of the printer not covered by the first Bayesian network, the second Bayesian network and the third Bayesian network. A fifth Bayesian network represents all possible settings in a printing system for the printer.
The invention presents several new ideas that combine to make the quality of the diagnostic process as high as possible while maintaining the efficiency of the knowledge acquisition as low as possible.
The method for selecting troubleshooting actions and questions used combined with the precise estimates of costs enable the preferred embodiment of the present invention to reach a diagnosis in as few as possible steps.
An automated troubleshooter in accordance with a preferred embodiment of the present invention allows easy logging of customers"" troubleshooting sequences, including information obtained programmatically from the customer system, and the outcome of the troubleshooting session (success or failure to diagnose). All this information can be directly logged by an automated troubleshooter with no human labor required.
Additionally, an automated troubleshooter provides for easy integration of programmatic data-collectors. It is relatively easy to improve the interactive gathering of data from the customer with programmatic data-collectors that directly query the customers PC, printer, etc. for relevant data that can be used in the troubleshooting process to speed up the diagnosis.
An automated troubleshooter in accordance with the preferred embodiment of the present invention also provides for easy integration of multi-media presentations. It is possible to utilize multi-media presentations such as graphic pictures and sound to help illustrate problems and guide the customer to the correct selections. Graphic pictures in particular can in many situations greatly simplify the description of problem scenarios and make it much more likely that the customer makes the correct selection.
The preferred embodiment of the present invention presents a knowledge acquisition (authoring) method for constructing automated troubleshooters in a highly efficient manner, by following a clearly defined process. The knowledge acquisition is commonly recognized as the bottleneck of automated troubleshooters as it is usually cumbersome and very time-consuming. The preferred embodiment of an automated troubleshooter in accordance with a preferred embodiment of the present invention puts constraints on the general Bayesian network modeling phase, and only allows very strict simpler structuresxe2x80x94thus limiting the scope and increasing the efficiency of the knowledge acquisition.
An automated troubleshooter in accordance with the preferred embodiment has several other advantages. As the troubleshooting is controlled by a computer program, it is possible to log everything that transpires between the troubleshooter and the user. In the situation where the troubleshooter is not able to solve the problem, it will be able to give control to an experienced support agent who can take over the troubleshooting process. This agent will be able to see the log of the previously suggested and performed steps, and the final probabilities on the causes. He can use this information to decide whether skipped steps should be re-suggested, whether performed steps with doubtful answers should be re-suggested, or whether more advanced steps not included in the troubleshooter should be suggested. The automated troubleshooter will then not only cut down on the number of calls that reach support agents, but will also aid the support agents in some of the cases that cannot be handled by it.
The logging of all information also allows fine-tuning of the probabilities and costs in the troubleshooting models, using so-called learning techniques. For someone familiar with the area of Bayesian networks, it is easy to see that probabilities of questions, actions and causes can be improved by the large amounts of information that will be gathered, e.g., the identified causes, question answers, successful or failed actions, etc. Also it will be possible to improve the time component of the cost of actions and questions, simply by measuring the time-span from the step is suggested to it is answered. On the average, over a large number of cases, this will yield the true time requirements of the step.
An automated troubleshooter constructed with Bayesian networks, as described herein solves problems related with the expensive diagnosis of printer systems. The present invention allows users to diagnose many problems themselves, saving the support agents phone calls and cutting down on expenses.
The present invention also improves quality and speed of diagnosis. The invention produces optimal (as short as possiblexe2x80x94given weak assumptions) troubleshooting sequences.
The present invention also improves the consistency of diagnosis. The invention also makes progress in removing the knowledge acquisition bottleneck seen in many diagnostic projects by limiting the modeling flexibilities of the troubleshooter builders and defining a clear, well-structured knowledge acquisition process that can be followed without having any knowledge about Bayesian networks.
Compared with the automated troubleshooters suggested by Heckerman et al. 1995, the present invention improves and extends in several aspects. The invention contains a complete method for knowledge acquisition of troubleshooter models, something not presented before. The invention also extends on the algorithms for selecting the best next step in several areas.