Testing involves performing an operation to determine if the actual result matches an expected result. If the actual and expected results do not match, the test is failed. The classic test is that of a child in school. A child is asked to perform an operation, e.g., a math problem. If the child produces an incorrect result, the test is failed. This information is valuable to the teacher, because it provides an indicator of whether the lessons are effective. It may also provide an indicator of the child's ability.
Today, testing is widely used in virtually every industry to determine valuable information about products, systems, employees, organizations, and more. Moreover, a single test may be insufficient to gather desired information. A company that is attempting to increase the safety of a product, for example, may conduct many tests in different scenarios. A large car may perform quite well in a head-on collision test, but may perform poorly in a rollover test. The overall safety of a vehicle may be measured by a number of tests that are prioritized by frequency of the test scenario in the real world.
Software developers in particular make heavy use of testing. Buyers of software products expect those products to work on their computer systems. A product that has not been fully tested may simply cause irritation if it causes computer system malfunctions, but it may also cause more serious problems such as opening a security loophole for attackers or causing the loss of large amounts of valuable data. In response to the need for software testing, there have been a number of advancements in the field. These advancements are generally directed to determining the appropriate software tests to run, test results analysis, and automation of performing tests.
First, determining which scenarios to test is important in software testing. Just as a car encounters many scenarios on the road, software operations occur in many scenarios within computer systems. Computer systems are built using a wide variety of components, which may be configured differently. The state of the components changes when they are called upon to execute software. Therefore modern software testing involves not only testing the many operations that an application may perform, but testing those operations in a subset of the various scenarios in which the operations are likely to occur. It may be significant that an operation is performed while a computer is also running MICROSOFT WORD®. It may also be significant that a computer has a wireless internet connection, or that the computer has both a wireless internet connection and runs MICROSOFT WORD® when an operation is performed. There are so many variables that testing an operation in every single possible computer state is impractical. Therefore, a determination of which computer states to test is an important aspect of software testing.
Second, test results analysis is an area of advancement in software testing. This term, however, can mean several different things. In a traditional sense, it refers to investigation of why a particular operation failed a test. Products developed by VECTOR SOFTWARE®, VIASOFT®, and MERCURY INTERACTIVE® provide some tools for test result analysis. Some such tools also provide statistics on failure rates, e.g., they compute a percentage of tested operations that failed. They may also compute failure percentages for each operation, thereby providing a percentage of scenarios in which a given operation, such as “open file” failed. Developers may set a target failure rate for their product, such as 99%, which suggests that 99% of the scenarios in which an operation is performed will not yield failures. As soon as a given operation works 99% of the time, investigation of failures for the product can cease and the product is ready to ship. This approach, however, is weak in that the failures that are not solved may be particularly troublesome. Therefore tools that provide failure statistics do not lend themselves to ideal techniques for software testing.
Finally, the software testing industry has seen much advancement in automation of software testing. This is largely because of the sheer volume of tests that are generally considered desirable. Because software is often quite complex, there are many operations performed by any given application that may need testing. For example, an application may both open a document and close a document. It may also manipulate a document in any number of ways, which can be appreciated by any computer user.
The many software operations that may be tested combined with many test scenarios produces a potentially enormous number of tests that may be desirable in testing software. This concept is demonstrated in FIG. 1. For example, consider the testing that may be desired by a hypothetical software developer who writes an application 100 called “Jammer” for playing and editing music files. One of the many operations performed by Jammer is opening a file 101. Imagine that our hypothetical application 101 is opening a music file, e.g., “Smooth” sung by Santana and Rob Thomas. To ensure that this opening operation 101 will be performed smoothly in all scenarios in which it may be performed, the Jammer 100 developer may first test it in all of the various operating system environments 120 it may be performed. The Jammer 100 developer may acquire one computer for the MICROSOFT WINDOWS XP (“XP”) operating system, another for the MICROSOFT WINDOWS 2000 operating system, another for the APPLE MAC OS X operating system, etc. Testing only these operating systems 120 would require three tests: opening “Smooth” with Jammer 100 running on each of the operating systems 120. However, the operating systems may be used in connection with various processors 130 that affect the way the operating systems 120 run. For example, imagine that each of the operating systems 120 may run on a computer using any of the processor families INTEL CELERON®, AMD THUNDERBIRD®, and INTEL PENTIUM IV®. By introducing an additional variable, namely processors, which itself has three variations, suddenly there are nine tests to perform.
By extrapolating from FIG. 1, the potential magnitude of tests for software products becomes apparent. The operating systems 120 shown are not representative of all operating systems, and each operating system may have different versions for different languages. For example, there is an XP English version, an XP German version, an XP Spanish version, etc. Likewise, the processor families shown are just that—families of processors. Testing for each individual processor, as well as for other popular processor families, may be desired. Still further, operating systems and processors are only two of many variables that may be adjusted. Every time a new variable is added the number of tests can multiply by the number of possible variations, or states, of the new variable. This explosion is illustrated in the table 140 at the bottom of FIG. 3. A number of variables 150 are displayed across the top of the table 140. A number of states of each variable 160 is displayed on the left side of the table. The corresponding number of tests to be performed is presented. For 8 variables, each with 6 states, there are 1,679,616 tests to perform. As a result of this explosion in the number of tests, there have been significant advancements in automating software testing, directed to the automatic set up of tests and return of result files bearing information about the test failures that may help the process of failure investigation.
There are many software testing products currently available. AUTOTESTER® from AUTOTESTER®, HOURGLASS 2000® from MAINWARE®, OPTA2000® from TANDSOFT®, PANORAMA-2® from INTERNATIONAL SOFTWARE AUTOMATION®, SIM2000® from DPE & ASSOCIATES®, SIMULATE 2000® from PRINCE SOFTWARE®, TARGET2000® from TARGETFOUR®, TRANSCENTURY DATE SIMULATOR® and ENTERPRIZE TESTER® from PLATINUM®, TALC 2000® from EXECOM®, TICTOC® from CICS®, TEST 2000® and AGER 2000® from MVS®, VECTORCAST® from VECTOR SOFTWARE®, VIA/AUTOTEST® from VIASOFT®, TEST SUITE 2000® from MERCURY INTERACTIVE®, and WORKBENCH/2000® from MICROFOCUS® are all products aimed at software testing. These products are software that may generally help in analyzing relevant scenarios for software testing, determining why failures occurred, and automating the set up of tests in a way that returns useful test result files.
Available testing products, and testing technology generally, have improved software testing to the point that a large volume of useful tests can be run quickly and return result files that aid in the investigation of failures. Perhaps in part as a result of these advancements, another problem has appeared in the industry: the proliferation of test results. Currently, a set of test operations may be run automatically in a lab run that involves performing a number of tests in a variety of scenarios. A short lab run for a commercial software developer, which tests only a subset of operations against a single operating system, may approach 300,000 tests. A full lab run—all tests performed in all scenarios—may go well beyond 1,000,000 tests.
A good lab run, with few failed tests, may yield an average pass rate of approximately 95%. If a developer wants to investigate all failures, this means that there may be well over 50,000 failures to investigate. Furthermore, numerous lab runs may be conducted per week for products in the final stages of development. There may also be multiple products to be tested, along with updates to products that are often distributed by software developers. In this environment, verification of test results quickly becomes an unmanageable task. Employees hired to investigate failures may spend significant time verifying lab run results, thereby diminishing resources for other testing activities such as writing new tests or improving existing tests.
The proliferation of test results and the corresponding test analysis resources have not been met with sufficient technological advancements in reducing the labor involved in test result analysis. Currently, test result files can be differentiated, whereby identical result files can be categorized together. This provides some help in allowing test result analyzers to group identical failures over multiple lab runs, but result files may differ slightly even if a failure occurred for the same reason, simply because the failure occurred in different computing environments. Categorization based on entire result files therefore often requires redundant attention from result analyzers to slightly different result files.
Another technique currently in use allows a result analyzer to identify one or more particular result files that they know are associated with an expected “bug” or imperfection in software. All result files that match an exact specified description associated with the expected bug can be stripped from the set of result files to be examined. Alternatively, the tests that produce failures associated the known bug can be discontinued from future lab runs until the bug is resolved. This solution is practical but less than perfect, because it may be desirable to continue running the test associated with a bug for other computing environments and to keep the generated result files for analysis. Simply discontinuing tests raises a coverage problem, creating a blind spot in the testing of a software product.
In light of the forgoing deficiencies in the analysis of test results, there is a heretofore unaddressed need in the industry to provide improved techniques for automated test result analysis.