The present invention relates generally to testing software, and specifically to testing concurrent and/or distributed software.
Tests for checking sequential software, i.e., software which runs on one platform and which has only one thread, are well known in the art. The tests are used to give a level of assurance that the software is fault-free, and may also detect and locate faults in the software. The tests vary in their ability to perform these tasks. For example, one may create a set of tests which ensures that every statement in a program has been executed. In this case, however, even successful completion of the tests will not necessarily detect a fault due to a statement which incorrectly defines a value of a variable. In order to detect such a fault, a second test checking and/or setting values of variables may be necessary. In general, whether a particular set of tests on sequential software detects a fault depends on the tests and on values of variables which are used by the tests.
In the context of the present patent application and in the claims, the terms xe2x80x9ccoverage,xe2x80x9d xe2x80x9ccoverage level,xe2x80x9d and xe2x80x9clevel of coveragexe2x80x9d refer to a metric of completeness with respect to a test or a set of tests. Considering the example above, a test or set of tests may ensure that all statements are executed, in which case these tests provide 100% statement coverage. However, as shown in the example above, tests which provide 100% statement coverage for specific software being tested do not necessarily complete testing coverage for the software in all respects, so that the overall level of coverage of the software may be low.
Coverage analysis defines the concept of a xe2x80x9cgood set of testsxe2x80x9d by formalizing the tests developed to ensure that as many areas of a program as possible are tested, to ensure that the coverage level of each of the areas is known, and to give measures for these coverage levels and for the overall coverage for the program. The purpose of coverage analysis is to direct the sampling process of a set of tests so that the end result is effective in revealing defects. Coverage analysis creates a sequence of testing goals, usually expressed as a hierarchy of coverage requirements, so that achieving each goal marks a higher confidence level in the xe2x80x9ccorrectnessxe2x80x9d of the program. Thus, when a required overall level of coverage is achieved, we can state with some level of certainty (related to the overall level of coverage) that the software under test is free of defects. Coverage may also be used as a stopping criterion for the testing process, i.e., when a certain level of coverage is achieved testing is stopped.
In the context of the present patent application and in the claims, a xe2x80x9ccoverage taskxe2x80x9d denotes a Boolean function on a test, and a xe2x80x9ccoverage modelxe2x80x9d denotes a definition of a particular type of coverage which is able to generate a corresponding set of coverage tasks. The outcome of a Boolean function on a test is the result of the function. For example, in a model designed to determine statement coverage of a program, the model will provide a set of outcomes of coverage tasks, such as xe2x80x9c . . . , statement 534 was executed in this text, statement 535 was not executed in this test, statement 536 was executed in this test, . . . .xe2x80x9d
FIG. 1 is a flowchart that schemtically shows a method for testing software, as is known in the art. In order to test the software, in a first step 10, a set of tests is generated according to predetermined guidelines, such as in the example described above. In a run step 11, the tests are run on the software, and the coverage is estimated. In a check step 12, missing coverage is estimated by comparing a coverage list comprising all possible coverage tasks with the actually covered coverage tasks from the tests. Further tests are performed, in a generate new tests step 13, in order to reduce the missing coverage. The further tests may follow the initial guidelines and/or may comprise further guidelines. The testing procedure concludes when a required level of coverage has been reached.
FIG. 2 is a block diagram schematically showing a process for testing sequential software 16, as is known in the art. Initially, sets of tests 14 are developed which are intended to perform checks 15 corresponding to execute each statement, execute each branch, and execute each define-use relation of the program. The sets of tests are run on the software, with initial inputs from a sample space 17 of all possible inputs for the software. After the tests have been run, software 16 outputs coverage lists 18 of which statements, branches, and define-use relations have been executed. From these lists an estimate of the coverage of the tests can be made. The tests may then be modified to change how the tasks listed above are preformed and/or to choose different inputs from the sample space. The modified tests are run and the process is continued, as described above with reference to FIG. 1, until a required level of coverage is achieved. It will be appreciated that because the sample space of a typical program is in general extremely large or even infinite, it is impossible to run all possible tests on the program.
Testing concurrent and/or distributed programs (CDPs), which comprise a plurality of threads and/or operate on a plurality of distributed platforms, is known to be significantly more complicated than testing sequential software. Whereas in sequential software the result produced by the program is uniquely determined by the inputs selected, in the case of CDPs the result produced depends both on the input space and on the order in which different tasks implemented by the CDP are performed. Thus, in order to determine the result for a specific CDP, additional information in the form of a sequence of schedule decisions in the case of concurrent programs, and/or an order of message arrival in the case of distributed programs, is required. In the context of the present patent application and in the claims, a set of such additional information is termed an xe2x80x9cinterleaving,xe2x80x9d and the set of all possible interleavings for a CDP is termed the interleaving space for the CDP. A test on the CDP is of the form (input, interleaving), wherein the interleaving is applied in conjunction with the given input. While a set of tests which covers all possible inputs and all possible interleavings will theoretically detect all defects, such a set cannot be implemented in practice.
Practical methods for testing the CDPs are known in the art. For example, in an article entitled xe2x80x9cAll-define-use-path Coverage for Parallel Programsxe2x80x9d by Cheer-Sun D. Yang et al., presented in the Association of Computing Manufacturer""s Special Interest Group on Software Engineering (ACM SIGSOFT), in their International Symposium on Software Testing and Analysis, 1998 (ISSTA 98), which is incorporated herein by reference, the authors suggest applying a define-use coverage criterion to a generalized control-graph of the CDP.
In an article entitled xe2x80x9cTesting Concurrent Programs: A Formal Evaluation of Coverage Criteriaxe2x80x9d by Factor et al., in Proceedings of the 7th Isreal Conference on Computer Systems and Software Engineering (1996), which is incorporated herein by reference, the authors describe an adaptation of techniques used for evaluating sequential program coverage criteria to an abstract concurrent language.
Methods for testing CDPs typically look for defects which occur in patterns. For example, access to a common variable by two or more external agents could affect the result of the CDP, depending on which agent accesses the variable first. Thus, a general defect pattern which could cause defects is access to a global common variable.
In a section of a xe2x80x9cUser Manual of a Generic Coverage Tool (GCT)xe2x80x9d entitled xe2x80x9cUsing Race Coverage with GCTxe2x80x9d by Marick, which manual can be found at http://www.mirror.ac.uk/sites/ftp.cs.uiuic.edu/pub/testing/gct.files/ftp.txt, and which is incorporated herein by reference, the author describes testing a concurrent program by investigating a general defect pattern wherein defects are likely to occur if a method can be executed by two threads simultaneously. The program is tested by ensuring that such methods are so executed.
In the context of the present patent application and in the claims, the following definitions apply:
An event is an operation performed by a central processing unit (CPU) which, if its order is changed, might affect an outcome of a CDP being run by the CPU. For example, access to a non-local variable of the CDP is an event, since the order of access to the non-local variable could affect the outcome of the CDP.
An interfering event, also termed herein an interference, is an event which affects the operation of a thread of a CDP. The thread is said to be xe2x80x9cinterfered withxe2x80x9d by the interfering event.
A critical event is an event which might affect a scheduler in a concurrent case (for example by causing it to change an order of operation execution), or an order of message arrival in a distribution case; additionally, a critical event is any interfering event.
An atomic event is a segment of a thread or a process execution which cannot be stopped by a scheduler.
A non-atomic event is a segment of a thread or a process execution which is not atomic, i.e., which can be stopped by the scheduler; a non-atomic event may contain one or more critical events.
In preferred embodiments of the present invention, a plurality of coverage models are defined. Each model is used to develop one or more tests to be performed on a concurrent and/or a distributed program (CDP). The models are arranged in a plurality of hierarchies of complexity, wherein within each hierarchy the least complex model provides the least extensive coverage and is correspondingly the simplest to implement. Within a hierarchy, as the models increase in complexity, they become increasingly xe2x80x9cstrongerxe2x80x9d in their ability to increase the level of confidence of freedom from defects, since each model includes the coverage of the previous models and adds to this coverage significantly, at the cost of being less simple to implement. Each hierarchy is most preferably utilized by testing sequentially from the least complex model in the hierarchy until a required overall coverage level is achieved. Unlike other methods known in the art for testing CDPs, preferred embodiments of the present invention have all of the following attributes:
Testing is defined directly on an interleaving space of the program.
The coverage models within the plurality of hierarchies are derived from general defect patterns, and so there is a greater chance of uncovering defects.
The plurality of hierarchies comprises coverage models which can each be implemented in practice. Each model in a specific hierarchy is stronger than the previous model.
In some preferred embodiments of the present invention, at least some of the plurality of coverage models comprise interleaving defined on the basis of interference with an event wherein the interleavings are defined in a time-independent manner so that the order of occurrence of interferences for these coverage models is immaterial.
In some preferred embodiments of the present invention, at least some of the plurality of coverage models comprise interleavings defined on the basis of interference with an event wherein the interleavings are defined in a time-dependent manner, so that order of occurrence of interferences is considered. Models so defined are higher in their hierarchy, i.e., give more coverage, than models using time-independent interleavings, since they include the coverage of the lower moldings.
At least some of the plurality of coverage models comprise interleavings combining time-independent and time-dependent interleavings. Models having these combined interleavings are highest in their hierarchy, giving most coverage, compared with other models. In testing a CDP, tests related to the least complex model in a hierarchy are preferably performed first, after which any defects discovered in the tests, which will typically be relatively easy to find, are rectified. Tests related to more complex models, typically comprising tests which are increasingly complex to implement, are then performed. The more complex defects detected by these tests are rectified, and the process of testing and defect rectification continues until a required level of confidence of freedom from defects in the CDP is achieved.
There is therefore provided, according to a preferred embodiment of the present invention, a method for analyzing software, including:
defining a plurality of coverage models for testing a non-sequential program responsive to an interleaving of the program, each of the coverage models having a respective coverage level;
arranging the plurality of coverage models in a hierarchy of increasing coverage level; and
testing the program using at least a subset of the coverage models in a sequence according to the hierarchy so as to achieve a predetermined overall level of coverage.
Preferably, the non-sequential program includes a program having a plurality of threads.
Preferably, the non-sequential program includes a program which is implemented on a plurality of central processing units (CPUs).
Further preferably, the interleaving includes a time-independent interference, and defining the plurality of coverage models includes defining at least some models responsive to the time-independent interference.
Alternatively, the interleaving includes a time-dependent interference, and defining the plurality of coverage models includes defining at least some models responsive to the time-dependent interference.
Preferably, the interleaving includes a time-dependent interference and a time-independent interference, and defining the plurality of coverage models includes defining at least some models responsive to the time-dependent interference and the time-independent interference.
Preferably, the interleaving includes one or more events which occur between a pair of sequential synchronization primitives, and defining the plurality of covering models includes defining at least some models responsive to the one or more events.
Preferably, the interleaving includes an m-tuple of interferences, wherein m includes a whole number.
Further preferably, defining the plurality of coverage models includes defining at least some of the coverage models responsive to an input to the non-sequential program.
Preferably, the plurality of coverage models includes a first coverage model and a second coverage model, and the first coverage model includes the second coverage model, so that a first plurality of coverage tasks of the first model includes a second plurality of coverage tasks of the second model.
Preferably, the plurality of coverage models includes a first coverage model defined responsive to a single thread interleaving and a second coverage model defined responsive to a substantially simultaneous interleaving, so that coverage tasks of a third model included in the plurality of models include a Cartesian product of coverage tasks of the first model and the second model.
There is further provided, according to a preferred embodiment of the present invention, apparatus for analyzing software, including a computing system which is adapted to define a plurality of coverage models for testing a non-sequential program responsive to an interleaving of the program, each of the coverage models having a respective coverage level, arrange the plurality of coverage models in a hierarchy of increasing coverage level, and test the program using at least a subset of the coverage models in a sequence according to the hierarchy so as to achieve a predetermined overall level of coverage.
Preferably, the non-sequential program includes a program having a plurality of threads.
Further preferably, the non-sequential program includes a program which is implemented on a plurality of central processing units (CPUs).
Preferably, the interleaving includes a time-independent interference, and the plurality of coverage models includes at least some models defined responsive to the time-independent interference.
Preferably, the interleaving includes a time-dependent interference, and the plurality of coverage models includes at least some models defined responsive to the time-dependent interference.
Further preferably, the interleaving includes a time-dependent interference and a time-independent interference, and the plurality of coverage models includes at least some models defined responsive to the time-dependent interference and the time-dependent interference.
Preferably, the interleaving includes one or more events which occur between a pair of sequential synchronization primitives, and the plurality of coverage models includes at least some models defined responsive to the one or more events.
Preferably, the interleaving includes an m-tuple of interferences, wherein m includes a whole number.
Further preferably, the plurality of coverage models includes at least some of the coverage models defined responsive to an input to the non-sequential program.
Preferably, the interleaving includes a first interleaving and a second interleaving, and the plurality of coverage models includes a first model defined responsive to the first interleaving and a second model defined responsive to the second interleaving, wherein the first interleaving includes the second interleaving, so that a first plurality of coverage tasks of the first model comprises a second plurality of coverage tasks of the second model.
There is further provided, according to a preferred embodiment of the present invention, a computer software product for analyzing software, including a computer-readable medium having computer program instructions recorded therein, which instructions, when read by a computer, cause the computer to define a plurality of coverage models for testing a non-sequential program responsive to an interleaving of the program, each of the coverage models having a respective coverage level, to arrange the plurality of coverage models in a hierarchy of increasing coverage level, and to test the program using at least a subset of the coverage models in a sequence according to the hierarchy so as to achieve a predetermined overall level of coverage.