Much of the resources for software development in industry are spent on testing and validating software. Industry adoption of automated validation techniques and/or automated test generation techniques has been limited to only some special software domains like automotive/control software. The main technical reason for this limited use of automated techniques is that of scalability. For instance, for moderately large code bases, some automated test generation techniques tend to generate millions of test cases, making them impractical. The predominant way that industry tests software currently is by manually generating test cases and then carrying out the tests. Manual testing, on the other hand, being resource intensive and incomplete, does not scale for increasingly complex modern software.
One approach to software testing is automatic test generation. There are several bodies of research on using automated testing techniques both to improve coverage and reduce cost. The automatic test generation can be classified into three different types: random testing, symbolic test generation and directed testing. Each of these types is discussed below.
A more successful industry use of automatic test generation techniques might be the use of random testing. However, the limitations of random testing are well known. For example, some branches of the program (e.g. if (x==42) . . . where x is an input) have a very little chance of being executed using traditional randomly generated inputs.
One of the first automatic test generation techniques proposed is to symbolically execute the program and solve the symbolic constraints on each path for each potential error inducing statement. Since then, there have been several improvements proposed including a recent state of the art technique called KLEE. KLEE improves traditional symbolic test generation techniques in many ways including modeling the environment to figure out the errors due to environment, using a powerful constraint solver, and parallel processing to efficiently solve many constraints. KLEE has shown that it is indeed possible to scale symbolic execution to thousands of lines of code. In the case of property testing, however, KLEE is still a brute force solution trying to look for problems in all the areas of the code.
Directed testing techniques use random testing to avoid intractability and use symbolic analysis to improve coverage. An example of an existing directed testing technique is Concolic Unit Testing Engine (CUTE). “Concolic” is a portmanteau of concrete and symbolic. Concolic testing, such as CUTE, is a hybrid software verification technique that interleaves concrete execution (testing on particular inputs) with symbolic execution, a classical technique that treats program variables as symbolic variables. Symbolic execution is used in conjunction with an automated theorem proven to generate new concrete inputs (test cases) with the aim of maximizing code coverage. Its main focus is finding bugs in real-world software, rather than demonstrating program correctness. CUTE starts off by randomly generating inputs as in the case with pure random testing. While executing the program on the random input, CUTE also collects symbolic constraints on the program statements and branches along the current execution path. After finishing the execution, CUTE generates new inputs for subsequent execution by analyzing the constraints collected in the previous executions. New inputs are generated by iteratively negating the symbolic constraints collected on the branches of the previous iteration and then solving them. If a solution exists, then the new inputs are certain to drive the program on the alternative paths, thus increasing coverage. In cases where it is hard to solve the constraints (e.g. due to limitations of the constraint solver) or no symbolic constraints exist because of external functions, CUTE falls back on using the concrete values of the previous iteration. While directed techniques like CUTE solve the practical problems of external calls (at the expense of some fault detection capability), they still suffer from the problem of path explosion. For example, a branch inside a loop which is executed N times for one input, could potentially generate 2N new inputs.
Another approach, referred to as “property checking,” is based on partial program verification and/or static analysis techniques. Recent research that adopts this approach has focused on partial verification or validation of software. These techniques check whether the software obeys certain properties of interest. The properties are typically expressed as type state properties. Some of these techniques have shown to scale to hundreds of thousands of lines of code. However, the biggest limitation of these techniques is the large number of false positives they report. This is mainly due to limitations on the analysis capability of static techniques in the presence of heap data structures and calls to external libraries.
Property-based testing describes a method for property testing that first slices the program with respect to the property of interest. It then performs unit testing with respect to the reduced program. The main problem with the conventional property-based approach is the reliance on static slicing algorithms. It is known that static slicing algorithms do not work well in the presence of aliasing and heap data structures.