1. Field of the Invention
This invention relates to computer systems and, more particularly, to computer systems management tools.
2. Description of the Related Art
The complexity of managing computer systems has been increasing rapidly. Mission-critical enterprise applications may be distributed over a large number (e.g., hundreds or thousands) of computer hosts and storage devices, and may in some cases comprise multiple independent layers or tiers provided by different vendors. Even applications intended for single users (e.g., intended for execution on a single desktop or laptop computer) may incorporate components from multiple vendors, and may rely on numerous hardware and software devices. Typically, different hardware and/or software vendors have developed their own custom approaches to the detection, diagnosis, debugging and resolution of problems and errors, and also to other systems management tasks such as logging audit trails, monitoring application behavior, etc. In the event of unexpected operating conditions or failures, hardware devices or software modules may be configured to generate messages (such as error messages or warnings) using terminology and formatting that may often be hard to decipher for non-experts. For example, the error or warning message may include a hexadecimal version of an address or identifier, instead of a descriptive name for an object. As a result, the user or administrator whose work is interrupted or made less productive by the unexpected operating condition may spend a substantial amount of time attempting to understand the message or messages, often without success. Ultimately, and especially for problems that may affect mission-critical applications, the appropriate expert may have to be found and consulted (sometimes at considerable expense to the end user and/or the vendor providing support), even in cases where the best response to the problem may require a relatively simple set of actions that the user could have performed if the generated messages had been more intelligible. Similar issues related to possible confusion caused by unclear system-generated messages may also arise in other systems management arenas not directly related to error diagnosis, such as event auditing, application monitoring, etc.
The problem of diagnosing computer systems has become even more complicated as the set of skills needed to understand and respond to systems management events have become more geographically dispersed, and as the user base for applications has expanded internationally. It may be a common occurrence, for example, for a particular user in a first country (e.g., Brazil) to buy a software application originally developed in a second country (e.g., the United States) and run the application on a computer system produced in a third country (e.g., Malaysia). In some cases, various levels of the support organizations for the computer hardware vendor and the application vendor may be physically located in a fourth and fifth country, respectively (e.g., Ghana and India). The computer system may be configured to gather error, warning and/or status messages from various hardware and software components in a central message repository, e.g., in a system-provided “Event Log” on computer systems employing versions of Windows™ operating systems from Microsoft Corporation or in “syslog” files or their equivalents on systems employing UNIX™-based operating systems. Over time, a large number of entries may be accumulated in such repositories, and each individual entry itself may include a large number of fields, which may be hard to assimilate using the interfaces traditionally provided to view such repositories.
If the user encounters a problem, such as, for example, a “hanging” (i.e., unresponsive) application or an unexpected reduction in performance, the user may be advised to consult the message repository in an attempt to troubleshoot the problem. A typical user (or even an expert user) may encounter several types of difficulties at this point, such as identifying which specific messages are relevant, understanding what the message contents or fields may mean, and/or identifying and performing corrective actions if any are needed. If the user cannot resolve the problem without external help, he or she may initiate a support call or open a “bug report” on the software or hardware component that is suspected to be at fault. Depending on the specific nature of the problem, the support call or bug report may have to be channeled through several levels of support organizations, e.g., among support personnel that may not all be fluent in the same set of languages, until the right expert is found. Each party involved in the problem resolution (e.g., the end user and one or more support staff members) may have to spend considerable time and effort trying to assemble and correlate the information provided by other parties (e.g., a description of the problem, the contents of one or more systems management messages, details of the environment in which the problem occurred, etc.) Language difficulties (i.e., a lack of an adequate level of fluency in a common language) may increase the chances of miscommunication and/or incomplete communication between the parties involved in the problem diagnosis and resolution, and may further increase the already high costs of support organizations.