1. Field of the Invention
This invention relates in general to a system for managing code coverage data, especially for a large scale development project.
2. Description of the Related Art
The testing of software during program development, from unit testing to functional and regression testing, is an ongoing task. The challenge of delivering quality tested code products has never been greater. The goal of testing is to verify the functionality of new and modified software As the size and complexity of a program increases, so does the amount of testing needed. Finding the right tools and processes to provide a better tested product is very difficult. Investing in the wrong tools or processes can be costly and possibility fatal for the product. A code coverage system seems to be a wise investment. Studies show that with 100% code coverage at the unit testing phase, one would detect 15% of the defects in the product. Another 45% of the defects could be found in the functional test phase. Questions which arise when evaluating a code coverage system include the following. Will a code coverage system really be a benefit? How does one collect code coverage data in a large scale development project when the code is constantly changing ? How does one store all of the data for a module that most of the test cases exercise?
With a code coverage system one can make intelligent decisions on what testing is needed. One can query the code coverage information and answer a number of questions:
1. What code has not been tested?
2. What is the test case overlap and which test cases can be deleted or combined?
3. When a code change is made to a module, what test cases need to be run?
4. When a set of test cases are run for changed code, what is the code coverage of the changed code?
5. Are new test cases needed for the new code?
6. If a defect was found, was the code tested? Were there test cases that exercised the defected area? (Casual Analysis).
7. What test cases are finding problems? Which ones are good candidates for regression testing?
Through this analysis, the testing cycle time can be reduced because selective test cases are executed as opposed to randomly selecting test cases.
Now that it has been established that a code coverage system will help in delivering a better tested product the characteristics of a large project will be examined. A large project usually has over 3 million lines of code, over 2000 modules and over 3000 test cases. The average release for the project is about one half million lines of new or changed code. It takes approximately 3 months to run all of the test cases.
During the functional testing phase of a project, new test cases are being executed to validate the product Code defects are found and fixed during this phase. How does one gather code coverage data during this testing phase? If one takes the approach of freezing the code, i.e. not allowing any changes, while collecting the code coverage data, there will be a never ending loop. Under this model, changes cannot be made to the code until after the code coverage data collection process is complete and the code can be un-frozen. Once the code has been changed, the code coverage data collection process must be repeated, and so on. If code coverage data collection is delayed until the the functional testing phase is over, the benefits of a code coverage system are not reaped. Also, the problem of continuous code changes being integrated into the product still exists, even though the changes may not be as frequent during later testing phases.
Another problem that arises with a large project is the amount of code coverage data that is generated. The question arises as to whether data for every test case should be saved for each line of code. Some code is common code, such as initialization code, and will be touched by all test cases. With 3000 test cases and 3 million lines of code, a very large database will be needed to store each line of code and each test case that exercised that line of code.
There exist today many code coverage systems in the industry.
For example, in one system, as tests are executed, each line of a test matrix that is executed is marked as such. This provides not only an indication of how many paths were executed by a given test, but also which specific paths. As the matrix is updated during all testing, it will be clear which paths have not been tested, and therefore what additional tests are needed to reach the target percentage of code coverage. xe2x80x9cAutomatic Unit Text Matrix Generationxe2x80x9d IBM Technical Disclosure Bulletin Vol. 37, No. 6A (June, 1994).
Other systems also provide a way to execute test cases and determine the effectiveness or coverage of the testing. See e.g., xe2x80x9cSoftware Test Coverage Measurementxe2x80x9d IBM Technical Disclosure Bulletin Vol. 39, No. 8 (August, 1996); and Bradley et al., xe2x80x9cDetermination of Code Coveragexe2x80x9d IBM Technical Disclosure Bulletin Vol. 25, No. 6 (November, 1982).
In yet another system, described in U.S. Pat. No. 5,673,387 to Chen et al., when a software system is changed, the set of changed entities are identified. This set of changed entities is then compared against each set of covered entities for the test units. If one of the covered entities of a test unit has been identified as changed, then that test unit must be rerun. A user may generate a list of changed entities to determine which test units must be rerun in the case of a hypothetical system modification.
In yet another system, described in U.S. Pat. No. 5,778,169 to Reinhardt, a programmer can use a test coverage tool to identify a subject of tests that executed a coverage point(s) corresponding to modified statements. This saves the programmer development time because the programmer can now run the subset of tests on an executable, compiled from the source code including the modified statements, and does not have to run the complete set of regression tests.
These past systems merely collect code coverage data and report on the data collected. Additionally, they describe how to collect more and better data to determine what test cases should be run or rerun.
However, none of these prior systems provide a tool or methodology to manage, preserve and keep track of code coverage data in a dynamic development environment.
To overcome the limitations in the prior art described above, and to overcome other limitations that will become apparent upon reading and understanding the present specification, the present invention discloses a method, apparatus and article of manufacture for a computer-implemented system for managing code coverage data.
In accordance with the present invention there is a provided a way to collect, maintain and preserve code coverage data each time test cases are executed and a way to handle code churn, i.e. code changes, without unnecessarily rerunning any test cases. Thus, in accordance with the present invention, the code coverage data, which may be stored in a database, is updated or resequenced when code changes are made to a program. This resequencing eliminates the need to freeze the program code while collecting the code coverage data. When a code change is incorporated into the system the resequencing routine makes the necessary adjustments to the code coverage data.
Furthermore, it may not be feasible to build a table in a database to store code coverage data for every test case. A user can reduce the table size needed by creating a table with a fixed number of columns in which to store the code coverage data. Then, in accordance with the present invention, the last column of the table may contain a pointer to a file. This file may then contain the test case results that exceeded the table.
Thus, an object of the present invention is to provide an effective code coverage data management system which eliminates the need to rerun an entire test case collection to re-collect code coverage data in order to have valid code coverage data.
It is another object of the present invention to reduce storage problems associated with code coverage data collection systems.