After a software and/or hardware system is developed and deployed, it is important to maintain the system. Generally speaking, system maintenance policies known at present may be categorized into two types, namely, manual approach and automatic approach.
Manual approaches rely on training and assigning professional technicians or experts who are responsible for resolving problems and maintaining specific software/hardware products. However, it usually consumes considerable time and cost to train an experienced professional with a higher level of skill. Statistics show that among the total time taken for detecting and resolving a system problem, about 60% of the time is spent on determining and identifying the problem. Even worse, statistics further show that among the root causes for problems as determined by a long-term communication between a user and a technician, above 95% problems have been encountered and resolved by other users or the user himself. It is clear that the manual approach for system maintenance and problem processing will cause waste in manpower, resources, and time.
Automatic approaches generally rely on a knowledge repository, for example, built on a server, which stores previously occurred problems and their solutions. However, most of such systems merely query the knowledge repository based on the initially collected problem symptoms and return the root causes as determined from the query as well as corresponding solutions to the user. However, the information which is collected immediately after a problem occurs may not suffice to determine the real root cause of the problem. For example, many different problems or exceptions possibly have same or similar symptoms during the initial stage. At this point, the diagnoses based on such initial symptoms may not lead to the exact root cause.
In fact, problems having the same symptoms in a software and/or hardware system might be caused by different root causes. For example, in a large-scale storage system, there are many factors that might cause the problem symptom “the user cannot access a particular storage array.” On the other hand, the same root cause in the same system might cause different symptoms in different conditions and states. It is insufficient to determine a root cause for a problem simply based on initial symptoms.
Therefore, there is a need in the art for a more effective problem diagnosis and recovery solution.