1. Field of the Invention
The present invention relates, in general, to systems and methods for analyzing core files and correcting and/or addressing bugs or errors in software applications executing on a computer system, and, more particularly, to an automated system and method for processing kernel and user core files created upon occurrence of an unexpected exception and for searching bug and patch records to rank the available patches and to recommend corrective actions that may be taken to enhance operation of the computer system.
2. Relevant Background
Computer system designers and analysts face the ongoing and often difficult task of determining how to fix or improve operation of a computer system that has experienced an unexpected exception or is failing to operate as designed (e.g., is experiencing errors caused by software problems or xe2x80x9cbugsxe2x80x9d). When a problem or bug in the computer system software is serious enough to stop or interrupt the execution of a running program, this failure is known as a crash. To assist in identifying bugs in the software operating on a computer system, software applications are often configured to create a crash dump or memory dump when an unexpected exception occurs to generate a memory image of the existing state of software executing on the system at the time of the crash or exception. These memory images are sometimes called core files (or dump files).
The system-level commands or programs in the operating system, i.e., the kernel software, are of particular interest to system analysts in correcting bugs in a crashed computer system. For example, in an UNIX(copyright)-based system, the kernel is the program that contains the device drivers, the memory management routines, the scheduler, and system calls. Often, fixing bugs begins with analysis of these executables, which have their state stored in a kernel core file. Similarly, user programs or binaries (e.g., binary, machine readable forms of programs that have been compiled or assembled) can have their state stored in user core files for later use in identifying the bugs causing the user applications to crash or run ineffectively.
Instead of writing a new, complete replacement version of the software (that crashed or had bugs), the designer or developer often prepares one or more small additions or fixes to the original software code (i.e., patches) written to correct specific bugs. For example, when a specific bug is identified, a patch is written or obtained from a third party to correct the specific problem and the patch is installed on the computer system. A single patch often contains fixes for many bugs for convenience. However, a particular bug is usually, but not always, fixed by a single patch (i.e., multiple patches usually do not address the same bugs). Typically, system analysts or operators keep or acquire records of previously identified bugs and corresponding patches installed for each identified bug. Then, when a bug is encountered in a system, the system analyst efforts to fix the problem begin with a search of these records of prior bugs to identify the bug or find a similar, previously-identified bug. Once the bug is identified, a relevant patch is selected that may correct the problem or a new patch may be written similar to or based on the previous patch. Additionally, the analyst may determine if a newer version of the patch is now available.
For example, a bug may be identified that causes an exception, such as causing the computer system to fall into panic when two specific programs are run concurrently. A record of the bug would then be created and stored in a database including a bug identifier (e.g., alpha-numeric identification code) along with descriptive information such as a synopsis describing the problem (for the above example, xe2x80x9csystem falls into panic while shutdown procedure is executed during writingxe2x80x9d) and information describing the results or symptoms of the bug (e.g., a crash, hang, stack trace, type of panic, and the like). Once a fix for the bug is available, a patch may be created containing the bug fix and other bug fixes. A patch record is associated with each patch. The patch record includes identifying information such as a patch identifier (e.g., an alpha-numeric code), references to corrected or addressed bugs, textual description of the purposes of the patch, references to specific software useful with the patch (e.g., a specific user application, kernel software for specific operating systems, and the like), dependent packages, related patches, and other useful identifying and patch-user information.
While providing useful information to a system analyst, the volume of information in these bug and patch files usually grows into a very large, unmanageable amount of information (e.g., 500,000 and more bug entries for widely-used operating computer systems and networks), and the amount of data in these files continues to grow as new bugs and patches are identified, created, and installed. Hence, the task of identifying appropriate patches for an identified bug is a difficult task, and system analysts often resort to making educated guesses for searching these lengthy patch records.
Existing searching methods for identifying appropriate patches to correct bugs do not meet the needs of system analysts. Searching methods and tools are typically fully or partially manual processes involving manually entering search terms to process the large patch record lists, identifying potentially relevant patches, and then selecting one or more patches.
In addition, the more direct approach of analyzing the resulting core file to accurately identify the bug causing the problem is an even more difficult task. The core file analysis tools available are typically only useful for kernel core files and are difficult to effectively use (e.g., require extensive training and knowledge of the system being analyzed which often can only be gained with years of working experience).
Often, the operator is unable to identify a specific patch for the problem and is forced to install numerous patches to increase the likelihood that the bug will be corrected. This inaccurate xe2x80x9coverxe2x80x9d patching is often time consuming, costly, and disruptive to the computer system, which may not be acceptable to users of the system. Some patch tools are available to identify patches that are installed on the computer system for which new versions are available (which in many systems is hundreds of patches at any given time), but these tools do not assist in identifying a particular patch for correcting an identified bug.
Hence, there remains a need for an improved method and system for identifying patches for installation in a computer system to correct or address software bugs or glitches. Such a method and system preferably would leverage existing tools and files (e.g., bug and patch files) and be configured to be easy to use with little or no operator training while still providing an accurate identification of appropriate patches to correct bugs identifiable in a core file (such as a kernel core file and, also, a user core file).
The present invention addresses the above discussed and additional problems by providing an automated core analysis system including a core analysis tool to allow a user, such as a system analyst, to quickly process a core file and search through available patches to identify one or more patches that address the problems (i.e., bugs) found in the core file. Significantly, the analysis of the core dump or core file is performed automatically by the core analysis tool, thereby reducing the need for special training and system knowledge. Patch searching is also performed automatically and in one embodiment, is more effective because it includes an initial step of creating a patch search set based on the software packages actually installed on the client computer system that generated the core file. The patch search set may be further narrowed based on the identified problem type. The patches in the patch search set are then ranked or scored by the core analysis tool based on a patch scoring system (e.g., based on matches between patch and bug descriptions and a crashed program and/or based on panic metrics or other search criteria). The scoring of the patches is then utilized (along with other relevant patch and bug information) to create a detailed patch search report or patch list that includes recommended courses of action for correcting the bugs in the client computer system (such as to install one or more identified and highly ranked patches).
More particularly, a method is provided for analyzing a core file created by or for a computer system. The core file is generally a memory image including information on programs executing on the computer system at the time of an unexpected interrupt. The core analysis method includes determining the packages installed on the computer system to narrow the field of patches that are processed during the analysis method. The set of packages is narrowed based on the type of problem identified. Next, patch files comprising descriptive data for previously identified patches are accessed and a patch search set is created that includes the patches in the patch file that are configured for use with the reduced set of packages. Each patch in the patch search set is then scored by assigning a number of points to each patch based on a predefined set of scoring rules. A patch search report is then created providing details on the scoring of all relevant patches, such as identifying which bug and patch matched specific search criteria. Update recommendations may also be included in this report by including a step for determining which patches have been previously installed on the computer system and identifying if newer versions of the installed patches are available.
According to a unique feature of the invention, the core analysis method is useful for providing a patch search report for user core files and for kernel core files. When the core file is a user core file, the method includes identifying matches between the descriptive data for the patches in the patch search set and program descriptive information in the core file. Additionally, cumulative scoring is provided for bugs referenced by the patch and program descriptive information in the core file. When the core file is a kernel core file, the method includes identifying a type of fault, gathering fault metrics, and creating a search criteria based on the identified type of fault and the gathered fault metrics. In a UNIX(trademark)-based application of the method, the fault type is a panic type and the fault metrics are panic strings, a number of pre-panic functions, and/or a number of pre-panic modules. Matches with each bug in the patch search set are determined for each of the fault metrics in the search criteria and a number of points are awarded or added to the relevant patch score. According to one embodiment of the method, the type of panic or fault is used as part of the method to adapt or modify the method by selecting which ones of the fault metric matches to award points. For example, it may be useful in identifying patches for installation to not award points for certain matches if the fault type indicates this match may be less relevant to correcting the actual problem in the computer system.