Because of the complexity of computer software, defects may exist in a software product even though great care is taken while designing, implementing, and testing a software product. Software products include operating systems, applications, programs, device drivers, etc. Once a software product is deployed, a software defect may manifest itself as one or more software failures. Software failures lie along a spectrum ranging from software failures having clear and easily reproducible symptoms to software failures having soft and/or difficult to reproduce symptoms.
Examples of software failures with clear, easily reproducible symptoms are automatic reboots, improper system accesses, operating system lockups, pronounced or severe performance degradation, severe connectivity problems, and/or severe data corruption. Such software failures usually leave evidence in easy to find and interpret sources like: (a) the memory of a computer device; (b) files stored on magnetic disks or like storage media; and/or (c) the execution stack of the computer. Prior art tools and techniques allow software failures having clear, easily reproducible symptoms to be relatively easy to discover. As a result, the software defects that cause such software failures are relatively easy to correct. Software failures with ambiguous and/or difficult to reproduce symptoms, hereafter be referred to as “soft failures” are more difficult to discover and, thus, the defects caused by such software failures are more difficult to correct.
Examples of soft failures are a loss of control of software components which provide an interface to hardware components, a loss of control of an executing process, intermittent poor performance of an operating system or software applications, and/or intermittent connection problems. Soft failures often require substantial investigation before the symptoms are able to be reliably reproduced and analyzed to identify the software defect, or defects causing the failure. Soft failures require substantial investigation because soft failures require: (a) an understanding of most, or all, of the major components of the computing device and the operating system running on the computing device; (b)the intervention of multiple specialists; (c) expertise in the use of multiple tools; and (d) specific defect detection experience for most or all of the major components of the computing device and the operating system.
The difficulties of reproducing the symptoms of a soft failure and the difficulties of identifying a soft failure, are exacerbated when the soft failure happens on a software product installed in a customer's computer, i.e., in the field. When a soft failure happens in the field, the engineer or technician supporting the software product is often unable to gather information about the failure directly and must guide a user through an information gathering process. Besides being inconvenient for both the user and support engineer, this information gathering process is usually time consuming and expensive.
The information gathering process involves extracting and interpreting information about the computing system and its components at, or just before, the time of failure. One way to accomplish this extraction is to add ftmctions to the computer system's software to generate a memory dump when a soft failure happens. A memory dump may be considered a sort of snapshot of the condition of the computing system at the time of the failure. A memory dump is usually stored in one or more files which reside on the computer device's hard disk or other storage volume. The condition of the computing system at the time the soft failure occurred can be assessed by analyzing information in the memory dump about the various hardware and software components in the computing system.
Substantial interpretation and analysis is usually required to transform memory dump information into knowledge leading to a solution that eliminates the software defect or defects that caused the soft failure. Often such interpretation and analysis is provided by one or more computer software programmers. The programmer or programming team receives a description of the symptoms of the soft failure and the memory dump. The description and the memory dump information is used by the programmer or members of the programming team to attempt to reproduce the soft failure, deduce the cause or causes of the failure, and develop a deployable solution that keeps the failure from reoccurring.
A programmer or programming team may spend weeks or months performing the aforementioned procedure. To help reduce the time and effort programmers expend to investigate soft failures, computing systems may include one or more software tools designed to assist the programmer in diagnosing soft failures. Computer software diagnosis tools are usually designed to run in the background while a computing system is operating. Such tools gather information intermittently or when triggered by certain events. When a soft failure occurs, the gathered information is passed on to a computer system support technician, engineer or software product programmer with the hope that the gathered information will narrow the scope of investigation.
Computer software diagnosis tools can be loosely categorized by the scope of the problem domain a diagnosis tool encompasses and by the level of automation a diagnosis tool provides. FIG. 1 shows these two axes of categorization with the vertical axis representing the scope of the problem domain from strict to open-ended and the horizontal axis representing the level of automation from partial to complete.
In order to maximize the utility of a computer software diagnosis tool, it is desirable to maximize the scope of the problem domain the tool encompasses and maximize the level of automation of the tool. A computer software diagnosis tool applicable to a wide variety of problems is more desirable than a tool that is only applicable to a small number of problems, or a single problem. Further, a computer software diagnosis tool that can be operated with little or no intervention by a human operator is more desirable than a tool that must be constantly tended to by a human operator. Prior art software tools are represented in FIG. 1 by the block labeled “Existing Diagnostic Tools” while more desirable tools are represented by the block labeled “Future Software Diagnostic Tools.” In FIG. 1, it can be seen that prior art software diagnosis tools have low automation, i.e., prior art software diagnosis tools require a high amount of human interaction. Further, the scope of the problem domain is strict, i.e., relatively small. In contrast, “Future Software Diagnostic Tools” i.e., desired but not yet available software tools have higher automation and a problem domain scope that is more open-ended, i.e., broader.
Although the prior art does include some software diagnosis tools and tool sets that are fairly automated and have a reasonably broad scope, in the past, such tools and tool sets have required large amounts of new and original computer software design and programming to develop. As a result, such tools tend to be expensive. Such tools also tend to be restrictively complex and difficult to operate.
What is needed is a relatively inexpensive, easy to use software diagnosis tool and associated method capable of assisting computer support personnel and computer software programmers in resolving soft failures that provides a useful level of automation and addresses a reasonably broad problem domain. The present invention is directed toward providing such a tool and method.