Software testing accounts for 50% of software development efforts throughout the history of software engineering. Coverage-based testing is one way to improve testing efficiency. Software reliability grows with the increment of test coverage. Test coverage provides a way to quantify the degree of thoroughness of testing.
Code coverage is measured after the tests are executed. Most research in the area of code-coverage based testing focuses on defining meaningful criteria and measuring coverage after tests.
Not much research has been done on improving testing before test cases are constructed. One area of such research is software design for testability. This work attempts to give guidelines on how to design software that will be easy to test and hopefully reducing the cost of testing.
The other area of pre-testing effort is code prioritization for testing. This research area attempts to analyze the programs and prioritize the code to guide the test construction to achieve maximal coverage effect based on various criteria. The question of which lines of the code should be tested first is often raised before test construction. Many criteria can be used to prioritize code for testing, such as change frequency, complexity metrics and potential code coverage. There are two kinds of code coverage of analysis that may be used in code prioritization, i.e., a control flow based analysis and a data flow based analysis. The control-flow based analysis uses criteria such as source line coverage, basic block coverage and decision coverage (these terms are described in the Terms and Description section hereinabove). The data flow based analysis uses criteria such as p-use and c-use, as one skilled in the art will understand.
One traditional method of code prioritization uses what is known in the art as a dominator analysis to determine code priorities, wherein the higher priority for a portion (P) of code, the greater the amount of code that is covered by test cases that are designed to execute the code for P. Thus, the dominator analysis provides a technique for efficiently testing the code of a software system in that test cases for high priority portions of code designed and input to the software system first. Dominator analysis was invented originally for C programs, in which each procedure can be quite large. However, dominator analysis is limited when applied to object-oriented programs. For example, one limitation with dominator analysis is that it considers only the node relationship within an object-oriented class method. That is, it does not consider dependencies among object-oriented classes and methods. Additionally, the calculations performed in a dominator analysis can consume large computational resources, both in computation time and data storage.
Unit testing has become an important step in software development. It is used in both extreme programming and conventional programming. It promises to move the costly testing and defect removal activities to earlier stages of software development, thus reducing such costs since it is well known that the earlier in development such defects are identified, the more cost effective the development effort. Writing unit tests is an essential part of the internal deliverables. However, unit test code is often not part of the deliverable code that gets delivered to the customer. Sometimes it is difficult to justify spending as much time in writing tests as writing code for a customer. Therefore, it is important to reduce the effort of unit testing by using automation, so that unit testing can be more widely adapted by developers.
Many parts of unit testing have been automated. For example, since unit tests are often represented in the source code's language, they can be compiled with the source and executed automatically. Generation of unit testing frameworks has also been automated, e.g., Junit www.junit.org JUnit is a regression testing framework written by Erich Gamma and Kent Beck. It is used by a developer who implements unit tests in Java. JUnit is Open Source Software, released under the Common Public License Version 1.0 and hosted on SourceForge. Another automated testing framework is Cunit written by Anil Kumar and Jerry St. Clair, documentation available at http://cunit.sourceforge.net. However, the generated tests obtained from such frameworks are represented in mocks or stubs, where users still need to fill in detailed algorithms in order that fully functioning test cases can be executed. Furthermore, none of the prior art generation methods emphasize generating efficient test data to increase the code coverage in an effective way. However, coverage-based testing tools do not consider automatic test generation. Even though some, such as χSuds provide hints on which part of the code should be tested first, they fail to generate the test sequence, and fail to generate actual test cases.
Much research on automatic test generation is based on specifications/models other than source code. For example, studies have applied control flow and data flow-based test selection criteria to system specifications in SDL for generating tests. Similar research has also been conducted on how to generate tests from UML models, FSM/EFSM/CEFSM-based models, and combinatorial designs, as one skilled in the art will understand. While a model-based method may be suitable for system level testing, it is not practical for unit testing because of the high cost in writing an additional model for each source unit.
Using various coverage criteria, dominator analysis prioritizes programs for increasing code coverage. A program block A dominates a block B if covering A implies covering B, that is, a test execution cannot reach block A without going through block B or it cannot reach block B without going through block A. This method is applicable to both data flow and control flow analysis. Without losing generality, we will use control-flow as examples throughout the present disclosure.
The dominator analysis starts from construction of a control-flow diagram from each function or method. Traditional dominator analysis for coverage-based code prioritization considers only control flow structural factors inside a function/method.
To explain how the traditional dominator analysis works, consider a C program that includes only basic source lines without any function calls. A control flow graph (alternatively, data flow graph) corresponding to the C program is then generated and the dominator analysis uses the control flow graph (alternatively, data flow graph) to identify the importance of various portion(s) (e.g., a line of codes) of the C program such that when these portions of the program are executed, e.g., via a particular test case, a greater number of, e.g., other program code lines must also be executed.
One such illustrative C program (wordcount.c) is given in FIG. 1. This program includes one function definition and the function does not call any other functions. The goal of testing coverage in this situation is to cover (i.e., execute) as many basic blocks (or decisions or other important code characteristics) within this function as possible with the least number of test cases.
Dominator analysis method first constructs the corresponding control flow diagram (FIG. 2), wherein each node of the control flow graph corresponds to one basic block, which is defined in the Terms and Description section hereinbelow. The control flow graph of FIG. 2 includes a total of 10 basic blocks, each of which is represented in one oval-shaped node. A double oval-shaped node (e.g., node n1) represents the starting point of the program and the oval within a square box around it denotes the exiting node (e.g., node n10). Each program usually has one starting node and could have multiple exiting nodes.
Dominator analysis approach for basic block priority calculation includes five steps: 1) generation of a pre-dominator tree, 2) generation of a post-dominator tree, 3) combining the two trees, 4) identification of the strongly connected components to form a super-block dominator tree, and 5) perform a priority calculation using the super-block dominator tree.
An example of how to obtain code priorities using the five steps will be discussed with reference to FIGS. 1 and 2.
1) Generate the Pre-Dominator Graph.
Using the algorithms given in (e.g., the reference Ref. 9 identified in the References section hereinbelow), the corresponding pre-dominator tree of the control flow graph in FIG. 2 can be generated as given in FIG. 3. A node x predominates a node y, if every path from the entry node to the node x includes node y. In the pre-dominator tree, node x is a child of node y. In FIG. 3, n9 predominates n5, n3, n2 and n1. It means that all paths going from the starting node through to n9 also go through node n1, n2, n3, and n5.
2) Generate a Post-Dominator Graph.
The post-dominator relationship is the same as the pre-dominator relationship in the reversed control flow graph. A node x post-dominates a node y, if every path from node x to all exiting nodes includes node y. The node x is the child of node y in the post-dominator tree. The post-dominator tree of FIG. 2 control flow graph is given in FIG. 4.
3) Combine Pre- and Post-Dominator Graphs
The combination of FIG. 3 and FIG. 4 generates a graph as given in FIG. 5.
4) Identify and Group Strongly Connected Components
Strongly connected components are the groups of nodes having numbers that dominate all the member nodes in that group. After grouping strongly connected nodes and removing redundant edges, the super block dominator graph is given in FIG. 6.
5) Assign Coverage Priority to Each Node of the Original Control Flow Graph
Based on the FIG. 6 super-block dominator graph, the priority of each original node can be calculated. First assign a weight to each original node, which is defined as the number of source lines included in that node. For example, the weight of node n1 is 5 because it includes 5 source lines. Second, using a top-down traversal approach to go through the super block dominator graph, assign a priority to each super block node, wherein the priority is the summation of the individual nodes inside each strongly connected group (super block) plus the priority of the parent super block. For example, the super block (strongly connected group) “n1,2,10” has a priority value of 9, which is the summation of the weights of nodes “n1” (5), “n2” (2) and “n10” (2) of the control flow graph of FIG. 2 (note that since the super block node “n1,2,10” does not have a parent node no additional priority value from another super block node is added). However, for the super block “n3,5,9”, it has a priority value of 13, which is the sum of this node's parent node priority of 9, plus each of the node weights for the nodes n3, n5, and n9 (i.e., 2+1+1).
In summary, we obtain priorities or weights for each node of the original control flow graph. For nodes 1, 2 and 10 of the original control flow graph, each have a priority of 9 because covering any of them will guarantee to cover 9 lines of code on the three nodes. Nodes n3, n5, and n9 each have a priority of 13. Nodes n4, n6 and n7 each has a priority of 14. Node n8 has the highest priority of 16 (i.e., 13 from node “n3,5,9” of FIG. 6, and 3 from “n8:13,14,15” of FIG. 2). The complexity of the dominator analysis method is O(N+E) when N is the number of nodes (in the original control flow graph) and E is the number of edges (in the original control flow graph).
The original dominator analysis method does not include impact of global coverage. Consider a practical scenario as follows. Suppose we are given a piece of large complex software to test and the software includes 10 packages, each of which has an average of say 200 classes and each class has an average of say 50 methods. The question is which package, which class and which method should be tested first to achieve the highest coverage, i.e., which part of the code has the highest priority. To answer this question, we need to consider global coverage impact of dominators, which is not provided in the conventional dominator analysis method.
Note that the dependency relationships among “invocable program elements” (e.g., packages, classes and methods) without control flow graph analysis cannot guarantee execution relationships among such invocable program elements. For example, the dependency of a method x calling a method y cannot guarantee that y will be covered whenever x is covered. Moreover, dependency diagrams such as one or more call graphs do not give dominator information among classes and methods.
Accordingly, it is desirable for such higher-level dependency relationships to be added into the prior art control flow graph analysis methods.