An important part of the development of software systems is testing. Because software systems can involve millions of lines of source code in separate modules or routines which must interact, testing is necessary before a system can be shipped, so as to confirm that a given system performs as expected under various configurations and with various inputs. This complexity is only increased in the case of distributed systems or multi-threaded systems, which evidence separately-executing threads or agents. Because these threads or agents may execute in different orders or on completely different machines or processors, interactions between the threads or agents are typically more complex than single-threaded systems, increasing the difficulty of testing. Oftentimes, extensive testing at different development levels, and under a wide variety of testing conditions, helps developers feel confident that the system is unlikely to exhibit unexpected behavior when used by consumers.
Different types of software system testing are used at different stages in development. For example, source code is tested at compile time for syntactic and logical errors before being compiled into executable code. Or, system implementations, either in part or in whole, are tested by users manually affecting inputs and configurations of the system to test against expected outputs. In yet other examples, this testing is automated, using a separate software module or application to automatically run software through batteries of tests in order to more efficiently examine system behaviors under pre-determined classes of testing conditions.
Software testing is often performed with reference to a specification of behaviors for the software system being tested. This is done, for example, when the software development process involves development of a behavioral specification before a system implementation is created by writing code. By testing the implementation against the behavioral specification, errors which have been introduced during the coding process can be identified and corrected.
The behavioral specification that underlies testing may include static and/or dynamic aspects. It may give actions as static definitions that are invoked dynamically to produce discrete transitions of the system state. In this case, the specification is often called a model program. Or, the specification may define possible transitions dynamically. In this case, the specification may be called a labeled transition system, finite-state machine or method sequence chart. Either way, the behavioral specification denotes a transition system.
One important distinction in software testing is between glass-box and black-box testing. In typical glass-box testing, a test developer or automated testing software module has access to the source code for a particular module, library, or application being tested and can insert code into the implementation in order to affect execution of the implementation or receive information during execution. In this way, the code can be tested at whatever level of specificity the test developer desires. By contrast, in typical black-box testing, a tester or testing software application can only manipulate a particular system implementation through the interfaces the system presents to a user or to other pieces of software. This provides an experience closer to that of a customer, and allows the tester to focus on the ways the implementation will perform once it becomes a product.
Conformance testing is a common method of black-box testing based on an executable behavioral specification and some correctness criteria. This kind of testing checks that an implementation of a software system conforms to its system specification by executing the implementation in a test environment that is aware of the states and transitions envisioned by the specification. Conformance testing of this type is often known as “model-based testing.” Oftentimes, records are made during execution of the implementation being tested which demonstrate the states and transitions that the implementation finds itself in during execution. This is sometimes called a “trace” of the execution. Conformance testing with a transition system involves checking whether an observed series of transitions in the implementation under test exists as a valid trace of the specified transition system.
The computer instructions for a software program may be performed along a single path of instructions with a single computer processor, with no other software executing concurrently. More often, however, the computer instructions execute concurrently with other threads of execution in the same software program or another software program, with a single computer processor or multiple processors, at a single site or multiple sites. Current techniques for conformance testing based on transition systems rely on the comparison of a particular interleaving of system events to a specification; typically this interleaving is obtained by simply observing events at runtime. Yet for many real-world systems, such as multi-threaded programs and distributed systems, it is not possible to directly observe a totally ordered, or serialized, sequence of system actions. This prevents existing techniques for conformance testing to be used on multi-threaded and distributed systems.
Prior techniques for conformance testing do not work well for multi-threaded or distributed software systems. These prior techniques for conformance testing of multi-threaded and distributed software systems include time-stamping and using a central event log facility.
One technique is to fully serialize the system. In such a method, a “time stamp” is given to each transition with respect to a global clock, and then transitions are sorted by time stamp. In one sense, this is equivalent to taking the position that a total ordering always existed, in other words, that only finer-grained instrumentation was needed to report the ordering of events. Modern computer hardware architectures illustrate the infeasibility of time-stamping, however. Consider a software program written for a hardware architecture in which memory writes are considered to be “in-flight” until an explicit memory-serialization operation occurs. Here the intuition of linear system time fails. During normal operation the system may never arrive at a single, stable state that can be seen uniformly by all agents. This arises from the fact that the hardware (as an abstract machine) does not respect the temporal order of reads and writes and provides different views of a given memory location depending on the context (such as CPU number) of the read operation itself. Hence, there exists no possible “time stamp” of a global clock that could serialize the actions of such computer hardware.
A second technique is to keep a centralized log of system events. In this scheme, each agent or processor reports its transitions to a central, serialized log. Unfortunately, such a global log introduces serialization of its own and therefore could materially affect the possible runs of the system. For example, in the case of multi-threaded programs, the very act of serialization by a test harness could eliminate certain classes of program errors. In other words, the act of testing the system would itself prevent some invalid behaviors from occurring. However, such errors could occur when the system was no longer under test.
What is needed are tools and techniques that facilitate testing of multi-threaded and distributed software systems.