Manual test generation methods have existed for several years in which testers manually evaluate a software system to be tested (referred to herein as “the program under test”), write test cases based on the evaluation that will test various aspects of the software system, combine the test cases to form a single test or a “test suite” (hereinafter referred to as a “test suite”), review the test coverage after running the test suite on the program under test to determine the adequacy of the test suite, and then refine the test suite by revising existing test cases of, or adding additional test cases to, the test suite.
The above-described process of generating a suitable test suite is an iterative process which begins by the identification of “coverage criteria.” Coverage criteria typically comprise an informal list of tasks to be accomplished by the test suite. This may be a subset of states, transactions, or paths within the software under test that need to be covered by the test suite. Next, the tester writes the test cases to cover the coverage criteria. The test cases are individual programs which carry out the tasks to be accomplished as identified by the coverage criteria. The programs which make up the test cases typically will be coded instructions which, when input to the program under test, will cause the program to take some action which “exercises” a portion of the program under test to see if it works. In addition, a particular test case may also include coded instructions that will verify the expected response to these actions. For example, if one of the coverage criterion for testing a word processing program is to test the print function, then a test case written to test this function might include instructions which would cause the word processing program to open a particular file, select the print function, display a “print properties” dialog box (this is an example of an expected response), select a particular print function (e.g., print the current page), and issue the print command to the printer port.
As part of the testing process, the software under test is modified to output a test trace during the running of the test suite in a well-known manner. This test trace may be a list of the states of the application under test after the execution of each step in the test case, and it may also include other details of the test execution path, including a list of procedures called, processes spawned, and the like. This test trace is input to a “coverage tool”, such as FOCUS™ by IBM and compared with the coverage that was expected as identified by the coverage criteria. Both the test trace and the coverage criteria are input to the test coverage tool in a language (i.e., code) compatible with the coverage tool. A test coverage report is generated listing which elements of the desired coverage were actually covered and which elements of the desired coverage were not covered. The test engineer will then take this test coverage report, analyze it and, based on this analysis, refine the test suite, typically by adding more test cases or revising existing test cases in the test suite. While this method works adequately for testing simple programs, as the software programs being tested become more complex, the process of developing and refining the test program becomes unwieldy.
Test suites developed using this manual method are typically very thorough in that many hours of thought goes into the process of developing them. Even though the primary focus of the test suite development pertains to the areas identified by the coverage criteria, because a human being (the tester or developer) is involved in the process from beginning to end, the final test program benefits from the insight and intuition of the human being. However, as a product matures and goes through several testing cycles these manually-coded test suites can grow to contain thousands of test cases. Test cases may be added as the test suite is refined and as more function is added to the program under test, and many times the added test cases contain redundant or otherwise unnecessary elements that may go unnoticed due to the sheer size of the test suite. This problem becomes worse as the complexity of the software being tested increases.
To speed up the process of generating test programs, more recently software testers have turned to automated test generators (e.g. Object Geode by Telelogic) which utilize both “behavioral models” and coverage criteria to automatically generate the test programs. The behavioral model is a formal description of part of the behavior of the software under test, and this behavioral model can be utilized by an automated test generator to generate a test suite according to separately defined coverage criteria.
The behavioral models are typically designed to identify and generate test cases to exercise portions of the program under test using an abstraction tailored to that specific purpose. The behavioral models represent the properties of the system as viewed through the lens of the abstraction; these properties are referred to herein as the “properties of interest” and represent only the aspects which are the focus of the behavioral model. All details outside of the focus of the abstraction are omitted from the behavioral models. The use of such abstractions is necessary due to the so-called “state explosion” problem often encountered in model-based test generation. When modeling complex software in any detail, the state explosion problem often causes automated test generators to fail to generate test cases using a reasonable amount of computing resources (time and storage space).
The coverage criteria serve to focus the test generator on aspects of the model that require an individual test case to be generated. For example, one coverage criterion might be directed solely towards a method of selecting a port of a particular server being accessed using the software under test; another coverage criterion might be directed solely towards testing the various methods of designating an IP address of a particular server using the software under test. While each of these coverage criteria function appropriately for the specific task with which they are associated, the overall testing of a software program using test suites based on these specific combinations of coverage criteria and behavioral models may suffer from their narrow focus, since no other aspects will be tested.
In a typical use of an automated test generator, a test engineer writes behavioral models in a formal modeling language (also known as a “functional coverage modeling language”) that is “understood” by the automated test generator being used. For example, test engineers may use finite state machines to model externally observable behavior of a program under test. They then input the models and coverage criteria to an automated test generator to generate test cases that are combined to form the test suite. There are many well-known methods of this type (and functional coverage modeling languages) as disclosed in commonly assigned co-pending U.S. patent application Ser. No. 09/847,309 entitle “Technique Using Persistent Foci for Finite State Machine-Based Test Generation,” filed on May 31, 2001, incorporated herein fully by reference.
As with manual systems, when the program under test is tested by the test suite, a test trace is output which is input to a coverage tool and compared with the expected coverage, and a test coverage report is generated, analyzed, and if changes are deemed necessary, the test engineer will manually refine the test suite by modifying the behavioral model.
Automatically generated test suites are not without their problems. Sometimes the coverage criteria used with the behavioral models may conflict with each other. For example, in finite-state-machine-based test generation, one coverage criterion may be to reach, in at least one generated test, all of the states of the system-to-be-tested that are represented in the behavioral model. A second coverage criteria may be that none of the generated tests shall enter one specific “forbidden” state represented in the behavioral model. These goals (i.e., “reach all states” vs. “never reach forbidden state”) conflict in that they cannot both be completely satisfied at the same time. To achieve a compromise, one goal must take precedence over the other, and some automated test generators are configured to default to the most restrictive or least restrictive option.
For example, assume a simple client program for opening a connection to a server, with there being four distinct methods of making the connection. Specifically, assume that the client program can use either a numeric IP address or a domain name to identify the IP address being requested by the server. In addition, assume that the client program must identify a particular port for access to the server, using either a default port or a user-specified port. One coverage criterion for testing this client program might have all four possible states occurring (i.e., numeric IP/default port; domain name/default-port; numeric IP/user-specified port; and domain name/user-specified port). Another coverage criterion might specify that the default port should never be specified (i.e., any test state that would require use of the default port is a “forbidden state”). In this example, if the automated test generator is configured to favor a more restrictive test over a less restrictive test, the default port connection method will not be tested, possibly leading to an incomplete test of the software system.
The problem is magnified as the test engineer specifies more coverage criteria to the test generator, since the conflicts (and the unforeseen side-effects resulting from the method of resolution of conflicts used by the automated test generator) quickly multiply.
To summarize, while each of the test generation methods (manual and automatic) have their advantages and drawbacks, improving the programs that they generate is desirable but still requires the manual analysis of the test coverage report output by the test coverage tools and the subsequent repetition of the original process, albeit in more condensed form, to create better tests.
Accordingly, it would be desirable to have available a method and system which would integrate test coverage measurement with model-based test generation so that the results developed by the test coverage measurements can be input directly to an automated test generator, thereby realizing automated test improvement capability.