The purpose of testing software is to provide an assurance that the software performs as specified, without defects. Methods for testing sequential software, i.e., software which runs on one platform and/or which has only one thread, are well known in the art. Testing non-sequential software, i.e., multi-threaded, concurrent, and/or distributed software that contains one or more threads or processes running concurrently, is more difficult. Such non-sequential, or parallel, software may contain defects resulting specifically from concurrent execution, such as “race conditions” (explained in detail subsequently). Methods for detecting race conditions are known in the art; however, these methods frequently report false alarms, i.e., detected race conditions that do not represent a defect.
By definition, sequential software executes in a deterministic manner. That is, given the same input, the sequence of statement execution is fixed and unvarying in repeated executions of the software. This determinism holds even in the case of interrupt-driven sequential software, where one form of input comes from unsynchronized elements external to the software such as hardware devices or a network, since the term “same input” refers not only to the content of the information but also to its timing.
Methods used for testing sequential software generally involve generating tests designed to discover faults in the software, combined with some measurement of coverage to ensure that the software is sufficiently exercised. Coverage measurement may be used to direct creation of new tests (for conditions not covered heretofore in the testing) or as a criterion for stopping the testing.
FIG. 1 illustrates a general method for testing software, as is known in the art. In an initial step 10, criteria which are to be met by the testing, such as required coverage percentage, are defined. In a test construction step 11, a test suite is generated according to predetermined guidelines, by techniques known in the art. The test suite comprises a collection of test conditions, the conditions comprising information and inputs designed to exercise the software under test (SUT) in different ways, using a test harness. The test harness effects the execution of a test and typically comprises an oracle function. The oracle function determines if a test outcome is correct. The oracle may take the form of a person who examines the outcome and verifies correctness, or a software tool that compares an expected outcome to an actual result. The test conditions are loaded to the test harness in a step 12. In a step 14, the software under test is executed under the test conditions. A condition 16 checks if the result is correct, i.e., if the actual result corresponds to an expected result. If the results do not correspond, the defect is debugged and corrected in a step 17 and the execution of step 14 is repeated. After the defect has been remedied, a condition 19 checks whether the stopping criteria set in step 10 have been met, i.e., whether sufficient testing has been performed. Sufficient testing is constrained in practice by coverage metrics, time, cost, error tolerance and/or other relevant parameters for the software under test. If the stopping criteria are not met, testing continues by selecting another set of test conditions from the test suite.
In the context of the present patent application and in the claims, multi-threaded, concurrent, and/or distributed software is termed “parallel software” or a “parallel program.” Parallel software comprises a plurality of threads and/or operates on a plurality of distributed platforms, so that a sequence of statement execution is no longer deterministic, but rather is dependent on scheduler decisions, order of message arrival, synchronization mechanisms, and/or relative speed of hardware involved. Whereas in sequential software a result produced by the program is uniquely determined by the inputs selected, in the case of parallel software, the result produced depends both on an input space and on an order in which different tasks implemented by the parallel software are performed. In the context of the present patent application and in the claims, an “interleaving” is assumed to be a set of information which describes a sequence in which a parallel program executed in a given execution, and the set of all possible interleavings for a parallel program is termed the “interleaving space” for the parallel program.
FIG. 2 and FIG. 3 are a first and a second schematic timing diagram for a parallel program 26 containing two threads, as are known in the art. A thread 1 comprises ordered statements or code segments denoted by {A, B, C, D, E}. A thread 2 comprises ordered statements or code segments denoted by {F, G, H, I}. A first interleaving 25 shows an order in which the program executed during a run of the software: A F B G C H D I E. FIG. 3 shows another run of the same parallel program as in FIG. 2, under the same test conditions. In the second run, a different interleaving 28 results: F A G B H C I D E.
It is known in the art that most software defects (bugs) belong to a limited number of identified classes. Some examples of these classes are failure to initialize variables, inadequate handling of boundary conditions, and array overflow or underflow. These classes of defects may appear in either sequential or multi-threaded, concurrent, and/or distributed software.
Because of the complexity introduced by the multiplicity of possible interleavings, parallel software is prey to further types of defects, in addition to those defect classes endemic to all software. In Appendix E of Threads Primer by Lewis and Berg, published by Prentice Hall, 1996, which is incorporated herein by reference, the authors provide a taxonomy of common bugs in parallel software, such as depending on scheduling order and not recognizing shared data. Some of these defects may cause a situation known as a “race” or a “race condition.”
Race conditions occur when two or more simultaneously executing threads or processes access the same shared memory location without proper synchronization, and at least one access is a write. Synchronization refers to a method by which access to a resource, in this case, a memory location, is controlled so that each competing access or set of accesses is assured of completion, in an intended sequence, without interference. The resource is locked by a thread or process until its access is complete, assuring that no other thread or process interferes. Synchronization is achieved by explicit use of methods known in the art, such as semaphores, mutexes, and critical sections.
A distinction is made between synchronization races and data races. A synchronization race occurs intentionally in a parallel program, where there is competition among the component threads and/or processes to seize a synchronizing resource, for example, to lock a semaphore or enter a critical section. Synchronization races are considered a useful feature of parallel programs, supporting the intentionally non-deterministic behavior of the software.
A data race also contributes to the non-deterministic behavior of the software; however, a data race occurs unintentionally, generally as a result of improper (or absent) synchronization, and often signifies a defect in the software. The result produced by the program will vary from one execution to the next, based on the order in which the threads or processes accessed the shared memory location. In the context of the present patent application and in the claims, “race” or “race condition” refers to a data race, and a data race is assumed to consist of two accesses to the same memory location if at least one of the accesses is a write and the two accesses are not synchronized.
FIG. 4 is a schematic representation of a first execution of parallel program 26, with reference to interleaving 25 (FIG. 2), as is known in the art. FIG. 4 focuses on an initial section of interleaving 25. Thread 1 contains statements A and B, Thread 2 contains statement F. In this example, statements A and F both perform a write access to a shared memory location X. Statement B reads the contents of X, which determines a program result in the example. Interleaving 25 of FIG. 2 determined the sequence of the execution and the program result. Since statement F updated X last, the program result is 0.
FIG. 5 is a schematic representation of a second execution of parallel program 26, with reference to interleaving 28 (FIG. 3), as is known in the art. FIG. 5 focuses on the same initial section as FIG. 4. In interleaving 28, statement A updates memory location X after statement F, and the program result is 7.
Since statement A and statement F wrote to memory location X without proper synchronization, statement A and statement F are said to be a race. Assuming that a single correct result is defined for the program, one of the two interleavings yielded a faulty result, indicating a program defect deriving from a race condition. For example, the developer may have intended that accesses A and B in Thread 1 complete, before any other access to X occurs. The developer should have included explicit synchronization commands locking memory location X before A and after B, to prevent any intervening access explicitly.
A distinction is made between an apparent race and a feasible race. In an article entitled What are race conditions? Some issues and formalizations by Netzer and Miller, in ACM Letters on Programming Languages and Systems, March 1992, which is incorporated herein by reference, the authors define a feasible data race as involving events that either did execute concurrently or could have. An apparent race refers to a condition in which some implicit synchronization mechanism is used, and therefore the data race will never occur.
Practical methods for detecting race conditions are known in the art. For example, in an article entitled Non-intrusive on-the-fly data race detection using execution replay by Michiel Ronsse and Koen de Bosschere, published in Proceedings of the Fourth International Workshop on Automated Debugging (AADEBUG2000), August, 2000, which is incorporated herein by reference, the authors describe a tool using a three-phase approach (record, replay and detect, and identify) which employs recorded synchronization information and time-stamped shared data access information to detect data races with minimum disturbance to the parallel programs normal execution. In a doctoral dissertation submitted to the University of California, Santa Cruz, Sep. 28, 1994, entitled A taxonomy of race detection algorithms by Helmbold and McDowell, which is incorporated herein by reference, the authors cite a conventional classification of race detection methods (static analysis, post-mortem, and on-the-fly), then propose an alternative taxonomy based on characterization of control flow constructs and synchronization methods used in the software.
Tools for recording and replaying parallel software executions are known in the art. Such tools are useful because of the possibility of recreating the precise interleaving that was identified as producing incorrect results. In an article, Deterministic Replay of Distributed Java Applications by Choi, Konuru, and Srinivasan, published in Proceedings of the SIGMETRICS Symposium on Parallel and Distributed Tools, August 1998, which is incorporated herein by reference, the authors describe a system for distributed Java applications which records the “logical thread schedule information and the network interaction information of the execution, while the Java program runs . . . [and] reproduces the execution behavior of the program.” In a paper entitled Multithreaded Java Program Test Generation by Edelstein, Fachi, Nir, Ratsaby, and Ur, presented at the Joint ACM Java Grande—ISCOPE 2001 Conference at Stanford University in June, 2001, which is incorporated herein by reference, the authors describe an architecture called ConTest, for rerunning tests under alternate scheduling decisions, which makes use of the replay technique defined by Choi, et al.
Current race detection methods frequently report false alarms. In an article entitled Improving the Accuracy of Data Race Detection by Netzer and Miller, published in Proceedings of the Third ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, April 1991, and which is incorporated herein by reference, the authors note that “[d]ata race reports generated by most existing methods can include potentially many artifacts, which can overwhelm the programmer with irrelevant information.” A false alarm or artifact is a detected race condition that does not represent a program defect. False alarms may result from numerous causes, for example, infeasible data races, implicit synchronization, data that does not affect program outcome and/or limitations and shortcomings of the particular race detection method used.