1. Field of the Invention
Portions of the disclosure of this patent document contain material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office file or records, but otherwise reserves all copyright rights whatsoever.
2. Background Art
Computing environments are becoming ever-increasingly complex. Today business enterprises operate their companies in distributed computer networks, which have skyrocketed m complexity as the simple client-server architecture has given way to three-tiered and multi-tiered computer architectures. Resources and data are increasingly pooled and accessed remotely from stripped down user terminals.
With this increasing complexity there has been an associated increase in the difficulty that system administrators have to not only keep these systems functioning, but also to keep these system is functioning in an optimal manner.
Reliability, Availability, and Serviceability (RAS)
RAS has become a foundation for strategic success for most enterprises. Reliability refers to making a system as reliable as possible. Availability is directly related to downtime. The more time a system is down the less available it is. Serviceability refers to the processes that take place when a system is down.
Maximizing a systems RAS components is essential in a computerized world. For instance, modern systems are routinely capable of meeting user requirements 99% of the time. Still, however, a 1% downtime still exceeds 80 hours per year on a system run 24 hours a day, 7 days a week. The loss of end-user productivity in a 20 user system, for just a single hour, exceeds $1000.
If the application itself is revenue generating, then the loss of a single hour of server availability could cost many thousands, or even hundreds of thousands of dollars. To illustrate, at Federal Express, the loss of a single hour of server availability is estimated to cost a million dollars. The cost of losing even a single minute of global transaction availability for Visa or Mastercard approaches ten million.
Solutions to Increase RAS
Traditionally, when problems occurred, system administrators would contact an expert, such as the maker of their computers or other technician, who would either come to their site or communicate via telephone or electronically. The expert would walk the user through the problem until a solution was found and the system was up and running once again.
One problem associated with this solution is that it is wastefully repetitious, in that the same problems that occur to different users repeat frequently. An expert called upon to repair the problem, then, will often have to repeat the same steps over and over to correct the same problems different system administrators may have. Alternatively, different experts would each have to separately solve different users identical problems in non-uniform manners.
Another problem associated with this scheme is that it only is implemented when something goes wrong (i.e., it is not proactive). If the system is not optimally configured and bound to fail soon, nothing intervenes until the system actually crashes. Therefore, using this solution there is always a time that the system is not available, and hence, the system is less reliable.
The present invention provides an automated problem identification system. The invention analyzes a customer""s computing environment, including administration practices, system configuration including hardware, software and the operating system. Then the invention compares the computing environment to an internal rules database. The internal rules database is a compilation of various problems that are known to exist on various configurations. Then, instead of calling an expert when there is a problem and repeating the process for every customer, the invention uses a proactive approach by analyzing a given system configuration and comparing it to a body of known problems, before the system breaks down.
In one embodiment, the invention generates a prioritized list of problems or non-optimized aspects of the system and lists them according to severity. To analyze the customer""s computing environment, one embodiment of the invention generates a list of questions that relate to the user""s computing environment. Another embodiment implements a tool that analyzes and gathers data about the computing environment in an automated manner. When a problem is encountered that is not in the rules database, the problem is transferred to a human engineer, who solves the problem and updates the rules database with the solution to the problem for that given configuration.
Another embodiment of the present invention uses a user interface, a knowledge base, and a knowledge engine. The user interface is where the human interaction with the system occurs. The knowledge base comprises a series of checks which contain granular information about problems that may be encountered. The knowledge engine is a software component that interacts with the knowledge base and user interface to interpret checks and produce recommendations.