This invention is directed to a method and system for providing data coverage analysis of a computer program.
Compilers convert a source program that is usually written in a high level language into low level object code, which typically consists of a sequence of machine instructions or assembly language. The constructs in the source program are converted into a sequence of assembly language instructions. To obtain a reasonable level of confidence in the correctness of the program, it is advantageous to test the compiler-generated code on a wide variety of inputs so that each and every block of code is exercised.
There are currently a number of tools available for doing program flow analysis or path coverage analysis of an application program. During program development, these instrumenting tools allow a developer to see which control paths are executed in his program. The tool instruments the code, adding monitoring capabilities so that it can determine which blocks of code get executed, and how often.
After running the application program along with the instrumenting tool, there is usually some kind of visualization facility which color codes sections of code in the application program based upon the frequency of execution of each code section. A product called PURE COVERAGE(trademark) (available from Rational Software Corporation of Lexington, Mass.) provides such a service. It has been observed that code which is tested during the development phase is less likely to contain bugs than code developed without such testing.
A related issue arose in a commonly available processor that included a floating point divide bug where a large data table driving the division algorithm contained some incorrect values. The testing employed at the time failed to detect the errors because there was no exhaustive test of all elements in the data table.
This example exposes a problem in the art. Whereas the path flow analysis may have been straightforward, well tested, and have contained no errors, no comparable analysis was performed on the data tables required for a floating point divide operation. Consequently, an incorrect entry in this table went undetected by whatever instrumenting techniques were applied to that program.
This experience therefore demonstrates a shortcoming of the path flow analysis technique. Even if the logic flow of an application program is thoroughly analyzed, and found to be correct by an instrumenting program, there remains the possibility that the program could malfunction upon execution because there has not been a comprehensive evaluation of the correctness of entries in the data table employed by the application program.
Therefore, there is a need in the art for a method and system for identifying, after execution of a program which accesses data tables, the number of times each element of each data table was accessed.
There is a further need in the art for a method and system for identifying elements in data tables which have not been accessed at all.
These and other objects, features and technical advantages are achieved by a system and method which generates a data coverage specification which identifies functions of interest and memory locations within a range of interest, and then instruments program statements which satisfy one or more criteria of the data coverage specification.
In order to determine the number of entries in a data table accessed during operation of an application program, the areas in computer memory associated with this data table must be identified. The developer lists functions of interest and memory locations associated with data tables which are of interest to the developer. The resulting package of information is the data coverage specification. The application program and data coverage specification are provided to the data coverage instrumentation tool which actually searches through the program looking for program instructions which access memory locations of interest.
The existence of the data coverage specification permits the instrumentation tool to concentrate on instructions which access selected areas of memory, rather than instrumenting all memory access instructions, thus reducing the workload of the instrumentation program. Alternatively, all code which accesses constant data could be instrumented thereby providing greater simplicity to the instrumenting algorithm, but also incurring the additional processing time of instrumenting a greater total number of instructions.
In a preferred embodiment, the data coverage specification identifies both functions of interest which can be mapped to code regions of interest as well as data tables to be checked which are located in memory areas of interest.
Mapping of function names to code areas of interest requires mapping information connecting the function names to the code areas in memory. Such mapping information is commonly found in executable image files. In an alternative embodiment, if such function to memory location mapping is unavailable, the functions to be instrumented could be identified by explicitly stating the addresses where the functions are found. This latter approach is however, more inconvenient for the developer.
In a preferred embodiment, the mechanism executes a two phase process for keeping track of access or reads from data tables of interest. The first phase involves instrumenting only instructions associated with functions of interest, such instructions being instructions of interest. The mechanism then acts to determine whether the instruction of interest accesses or may access a memory region of interest. If the instruction either does not access memory, or accesses memory which is definitely outside the memory region of interest, no further action is taken. If the instruction of interest either may read from or definitely reads from a memory location of interest, the second phase of the memory access tracking is activated which is preferably the insertion of dynamic tracing code.
Preferably, the dynamic tracing code determines whether the instruction of interest identified in phase one as possibly accessing the memory region of interest in fact accesses this region. If the dynamically traced instruction is ultimately found not to access the memory region of interest, no further action is taken with regard to that instruction. If the dynamically traced instruction does in fact access a memory region of interest, the counter for the data element in the region of interest accessed by the instruction is appropriately incremented. The code added by the instrumentation tool will execute along with the application program in which it is embedded, and create an auxiliary table containing coverage information relating to memory access operations from the regions of interest.
By way of example, if the memory region of interest is a table of 100 data items, then a data coverage table would be created which also contained 100 elements, with each element having a counter initialized to xe2x80x9c0xe2x80x9d and which corresponds to a data element in the memory region of interest for the purpose of keeping count of the number of times that data element gets accessed during execution. For the purpose of this example, let us assume that the instruction xe2x80x9cADDxe2x80x9d is associated with a function of interest. Encountering an xe2x80x9cADDxe2x80x9d instruction will trigger further examination of the instruction. Now that the current instruction is known to be an instruction of interest, it remains to determine whether the instruction accesses a memory region of interest. If the instruction does not access a memory region of interest, no further action is taken.
If the instruction does access a memory region of interest, the data element in the region which has been accessed is identified along with its counterpart in the data coverage table. The appropriate element in the data coverage table is then incremented to reflect the read operation performed by the instruction of interest. This and other counters will be similarly incremented as subsequent instructions of interest are found to read from memory regions of interest.
Upon completion of execution, each counter would have a value equal to the total number of times that memory location was accessed. Any counter having a value of 0 after program execution would trigger attention from the developer, since the memory location associated with that counter has not yet been tested. Counter value data is then dumped out to a coverage file after execution of the instrumented program. There is a facility to merge the data coverage files resulting from different runs of the instrumented program.
Next, this coverage information is read from the merged file using a visualization tool which displays the number of times each element in the data table has been accessed. The visualization tool acts to more clearly illustrate the number of times each element in the table has been accessed. One approach to visualization would be to represent different ranges of access in different colors. In a preferred embodiment, Black would be used to indicate a high access level, Pink to indicate a low access level, and Red to indicate unaccessed items.
The above approach will identify for the developer, elements in the data table which have not been accessed by the application program in the course of running the test suite. With this information, the developer may either modify the test suite to ensure that all elements in the table are accessed, or examine the unaccessed elements by hand to ensure that they are correct.
Therefore, it is a technical advantage of the present invention that the number of accesses to each element in data tables of interest during execution of a program is identified.
It is a further technical advantage of the present invention that elements in data tables of interest which have not been accessed at all are identified.
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention. It should be appreciated by those skilled in the art that the conception and specific embodiment disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. It should also be realized by those skilled in the art that such equivalent constructions do not depart from the spirit and scope of the invention as set forth in the appended claims.