1. Technical Field
The present invention relates generally to the field data structures and, in particular to methods and systems for testing parallel queues.
2. Background Description
Operations on basic data structures/objects such as shared queues, priority queues, stacks, and counters can often dominate the execution time of a parallel program. This dominance arises due to the large number of operations on the data structure by the processors, including multiple operations contending for the shared data structure at the same time. An example would be a shared queue used to hold tasks available for execution by the processors, where multiple processors grab tasks from the head of the queue and deposit new tasks to the tail of the queue. There are considerable performance gains that arise from the development of highly-optimized, asynchronous, distributed, cache-conscious, parallel implementations of such data structures. Such implementations may employ a variety of xe2x80x9ctricksxe2x80x9d to reduce latencies and avoid serial bottlenecks, including servicing multiple requests simultaneously or even out-of-order. Examples include implementations based on the following: counting networks, as described by J. Aspnes, M. Herlihy, and N. Shavit, in xe2x80x9cCounting Networksxe2x80x9d, Journal of the ACM, 41(5):1020-1048, 1994; elimination trees, as described by N. Shavit and D. Touitou, in xe2x80x9cElimination Trees and the Construction of Pools and Stacksxe2x80x9d, Proc. 7th ACM Symp. on Parallel Algorithms and Architectures, pp. 54-63, July 1995; diffracting trees, as described by N. Shavit, E. Upfal, and A. Zemach, in xe2x80x9cA Steady State Analysis of Diffracting Treesxe2x80x9d, Proc. 8th ACM Symp. on Parallel Algorithms and Architectures, pp. 33-41, June 1996; or combining funnels with elimination, as described by N. Shavit and A. Zemach, in xe2x80x9cCombining Funnelsxe2x80x9d, Proc. 17th ACM Symp. on Principles of Distributed Computing, pp. 61-70, June-July 1998. In fact, the only requirement of the implementation is that it preserves the (serial) semantics of the data structure, as observed by the processors interacting with the data structure. The complexity of the implementation and the difficulty in reasoning about asynchronous parallel systems increases concerns regarding possible bugs in the implementation.
Prior testing of parallel executions has involved both distinct values and arbitrary values. For distinct values, it is guaranteed that each value inserted into a data structure is distinct. In contrast, for arbitrary values, there is no such guarantee.
Prior testing of parallel executions has also involved linearizable data objects and non-linearizable data objects. In a linearizable data object, each operation takes place over a time interval, and consists of two events, the first being the invocation of the operation by the processor, and the second being the receipt of the response (either a value or an acknowledgment) by the processor. This is described further by M. P. Herlihy and J. M. Wing, in xe2x80x9cLinearizability: A Correctness Condition for Concurrent Objectsxe2x80x9d, ACM Trans. on Programming Languages and Systems, 12(3):463-492, 1990.
For a trace to be valid for a linearizable data object, there must be a topological sort that (i) respects the order between any two events with non-overlapping intervals, and (ii) obeys the serial semantics of the data object. Thus the partial order is an interval order.
Linearizable data objects (also known as atomic objects) are well-studied (see e.g., N. Lynch, Distributed Algorithms, Morgan Kaufmann, San Francisco, Calif., chap. 13, 1996), as they have a number of desirable properties. In contrast, with non-linearizable data structures, there are no time intervals to respect. In this case, typically, the only partial order information is a total order within each processor. Thus the partial order is a union of chains (total orders), one per processor. The correctness condition does not impose any restrictions based on the real time that events occur. This correctness condition is denoted sequential consistency, and is a popular correctness condition for shared memory multiprocessors. Sequential consistency is described by L. Lamport, in xe2x80x9cHow to Make a Multiprocessor Computer that Correctly Executes Multiprocess Programsxe2x80x9d, IEEE Trans. on Computers, C-28(9):690-691, 1979. For comparison, more general non-linearizable data structures in which the partial order can be a series-parallel order (modeling some form of fork-join parallelism) or even an arbitrary partial order, were considered by J. L. Bruno, P. B. Gibbons, and S. Phillips, in xe2x80x9cTesting Concurrent Data Structuresxe2x80x9d, Technical report, ATandT Bell Laboratories, Murray Hill, N.J., December 1994.
A summary of related work in testing parallel executions will now be given. With respect to the problem of testing parallel executions of arbitrary linearizable shared data structures, a study of the same is described by J. M. Wing and C. Gong, in xe2x80x9cTesting and Verifying Concurrent Objectsxe2x80x9d, J. Parallel and Distributed Computing, 17:164-182, 1993. In the preceding article, the problem of testing arbitrary linearizable data structures is shown to be NP-complete, and an exponential time algorithm is devised. Wing and Gong also developed a simulation environment for implementing their testing algorithms.
Certification trails for testing sequential executions of balanced binary trees, priority queues, union-find structures, and mergeable priority queues have been defined and studied by G. F. Sullivan and G. M. Masson, in xe2x80x9cUsing Certification Trails to Achieve Software Fault Tolerancexe2x80x9d, Proc. 20th IEEE Fault-Tolerant Computing Symp., pp. 423-31, 1990; G. F. Sullivan and G. M. Masson, in xe2x80x9cCertification Trails for Data Structuresxe2x80x9d, Proc. 21st IEEE Fault-Tolerant Computing Symp., pp. 240-47, 1991; and J. Bright and G. Sullivan, in xe2x80x9cChecking Mergeable Priority Queuesxe2x80x9d, Proc. 24th IEEE Fault-Tolerant Computing Symp., pp. 144-53, June 1994. In this approach, the data structure code is modified to output additional information to assist in testing.
Other work on sequential testing and related issues includes: K.-H. Huang and J. Abraham, xe2x80x9cAlgorithm-based Fault Tolerance for Matrix Operationsxe2x80x9d, IEEE Trans. on Computers, C-33(6):518-528, 1984; B. Dixon, M. Rauch, and R. E. Tarjan, xe2x80x9cVerification and Sensitivity Analysis of Minimum Spanning Trees in Linear Timexe2x80x9d, SIAM Journal on Computing, 21(6):1184-1192, 1992; and P. Ramanan, xe2x80x9cTesting the Optimality of Alphabetic Treesxe2x80x9d, Theoretical Computer Science, 93:279-302, 1992.
Note that unlike testing sequential executions, testing parallel executions focuses on topological sorting since it does not assume a centralized serialization point or a central module implementing the data structure. This serialization point or central module is undesirable since it imposes a serial bottleneck in the parallel program. The sequential trace work, in contrast, focuses on testing procedures that are more efficient in time and/or space than the implementation being tested. These are also the concerns of the works on sequential program checking, such as those described by, for example: M. Blum and S. Kannan, xe2x80x9cDesigning Programs that Check Their Workxe2x80x9d, Proc. 21st ACM Symp on Theory of Computing, pp. 86-97, May 1989; M. Blum, M. Luby, and R. Rubinfeld, in xe2x80x9cSelf-testing/correcting with Applications to Numerical Problemsxe2x80x9d, Proc. 22nd ACM Symp. on Theory of Computing, pp. 73-83, May 1990; and M. Blum, W. Evans, P. Gemmell, S. Kannan, and M. Naor, in xe2x80x9cChecking the Correctness of Memoriesxe2x80x9d, Algorithmica, 12(2/3):225-244, 1994.
In a recent independent work, algorithms for checking sequential priority queues were presented by U. Finkler and K. Mehlhorn, in xe2x80x9cChecking Priority Queuesxe2x80x9d, Proc. 10th ACM-SIAM Symp. on Discrete Algorithms, pp. S901-02, January 1999. Their algorithms observe the sequential stream of operations at the data structure, and check to see if this stream is legal.
Testing the serializability of database transactions has been proven to be NP-complete by C. Papadimitriou, in xe2x80x9cThe Theory of Database Concurrency Controlxe2x80x9d, Computer Science Press, 1986. Testing a shared memory for sequential consistency or linearizability under a range of scenarios has been studied by P. B. Gibbons and E. Korach, in xe2x80x9cTesting Shared Memoriesxe2x80x9d, SIAM Journal on Computing, 26(4):1208-1244, 1997.
Other work on testing and related issues for parallel machines includes: P. Banerjee and J. A. Abraham, xe2x80x9cBounds on Algorithm-based Fault Tolerance in Multiple Processor Systemsxe2x80x9d, IEEE Trans. on Computers, C-35(4):296-306, 1986; V. Balasubramanian and P. Banerjee, xe2x80x9cCompiler-assisted Synthesis of Algorithm-based Checking in Multiprocessorsxe2x80x9d, IEEE Trans. on Computers, C-39(4):436-459, 1990; P. Banerjee, J. T. Rahmeh, C. Stunkel, V. S. Mair, K. Roy, V. Balasubramanian, and J. A. Abraham, xe2x80x9cAlgorithm-based Fault Tolerance on a Hypercube Multiprocessorxe2x80x9d, IEEE Trans. on Computers, C-39(9):1132-1245, 1990; Y. Afek, D. S. Greenberg, M. Merritt, and G. Taubenfeld, xe2x80x9cComputing with Faulty Shared Memoryxe2x80x9d, Proc. 11th ACM Symp. on Principles of Distributed Computing, pp. 47-58, August 1992; and J. L. Bruno and E. C. Coffman, Jr., xe2x80x9cOptimal Fault-tolerant Computing on Two Parallel Processorsxe2x80x9d, Technical report, ATandT Bell Laboratories, Murray Hill N.J., October 1994.
In addition, J. L. Bruno, P. B. Gibbons, and S. Phillips, in xe2x80x9cTesting Concurrent Data Structuresxe2x80x9d, Technical report, ATandT Bell Laboratories, Murray Hill, N.J., December 1994, presented an O(n3) time algorithm for testing parallel queues and priority queues, wherein n is the length of the trace. However, that algorithm is too slow for large n.
In summary, the prior art methods cannot be used for fast testing of parallel queues; the methods either do not apply to queues, apply only to sequential queues or parallel queues implemented with a serial bottleneck, require modification of the implementation code, and/or are too slow (e.g., O(n3) running time or worse).
Thus, it would be desirable and highly advantageous to have methods and systems for testing parallel queues that overcome the above mentioned deficiencies in the prior art methods and systems.
The present invention is directed to methods and systems for testing parallel queues. In particular, the present invention provides an O(n) time method/system for testing linearizable FIFO queues, an O(n log n) time method/system for testing linearizable priority queues, and an O(np2)time method/system for testing non-linearizable FIFO queues, where n is the number of enqueue or dequeue operations and p is the number of processors. The methods consider testing in the context of a single run of the program. This has the advantage of testing an actual run of the implementation under real conditions, not an abstraction.
According to a first aspect of the present invention, and with respect to a computer processing system comprising a linearizable queue and a plurality of processors, there is provided a method for verifying correct function of the queue with respect to a program executed by the processors. A distinct-values trace is given that includes the operations on the queue and an identifier associated with each of the operations. Each operation is associated with two timestamps respectively corresponding to a start time and an end time of the operation. The method includes the step of sorting the timestamps in ascending or descending order and placing the timestamps in an array A. The operations are matched to generate corresponding operation pairs, based on the identifiers. Each element of an array B is populated, such that B[i], the ith element of the array B, is equal to a start time of an enqueue operation of a given operation pair when A[i], the ith element of array A, is equal to an end time of a dequeue operation of the given operation pair, and such that B[i] is equal to zero when A[i] is not equal to the end time of the dequeue operation of the given operation pair. Similarly, each element of an array C is populated such that C[i], the ith element of array C, is equal to a maximum value corresponding to all values in the array B from one to i when the sorting is ascending, and from i to an end value in the array B when the sorting is descending. The function of the queue is identified as correct, when there does not exist i such that A[i] is equal to a start time of a dequeue operation of a respective operation pair and an end time of an enqueue operation of the respective operation pair is less than C[i].
According to a second aspect of the present invention, the method further includes the step of, upon performing the matching step, identifying the function of the queue as incorrect when there exists one of an unpaired operation and an operation pair such that a dequeue operation of the pair ends before an enqueue operation of the pair begins.
According to a third aspect of the present invention, the method further includes the step of identifying the function of the queue as incorrect, when there exists i such that A[i] is equal to a start time of the dequeue operation of the respective operation pair and the end time of the enqueue operation of the respective operation pair is less than C[i].
According to a fourth aspect of the present invention, and with respect to a computer processing system comprising a linearizable priority queue and a plurality of processors, wherein the queue supports insert and deletemax operations, there is provided a method for verifying correct function of the queue with respect to a program executed by the processors. A distinct-values trace is given that includes the operations on the queue and an identifier associated with each of the operations. Each operation is associated with two timestamps respectively corresponding to a start time and an end time of the operation. The method includes the step of sorting the timestamps in ascending or descending order and placing the timestamps in an array A. The operations are matched to generate corresponding operation pairs, based on the identifiers. For all i, in ascending order when the sorting is ascending, in descending order when the sorting is descending, starting with a set being initially empty, a value associated with a given operation pair is inserted into the set when A[i] is equal to an end time of an enqueue operation of the given operation pair that precedes a start time of a dequeue operation of the given operation pair, and the value associated with the given operation is deleted from the set when A[i] is equal to the start time of the dequeue of the given operation pair that succeeds the end time of the enqueue operation of the given operation pair. For each i, an array B[i] is populated with a maximum value in the set upon processing A[i]. The function of the queue is identified as correct, when there does not exist a respective operation pair such that a minimum value in the array B in an entire range max (a start time of an enqueue operation of the respective operation pair, a start time of a dequeue operation of the respective operation pair) to an end time of the dequeue operation of the respective operation pair is greater than the value associated with the given operation pair.
According to a fifth aspect of the present invention, the method further includes the step of, upon performing the matching step, identifying the function of the queue as incorrect when there exists one of an unpaired operation and an operation pair such that a dequeue operation of the pair ends before an enqueue operation of the pair begins.
According to a sixth aspect of the present invention, the method further includes the step of identifying the function of the queue as incorrect, when there exists the respective operation pair such that the minimum value in the array B in the entire range max (the start time of the enqueue operation of the respective operation pair, the start time of the dequeue operation of the respective operation pair) to the end time of the dequeue operation of the respective operation pair is greater than the value associated with the given operation pair.
These and other aspects, features and advantages of the present invention will become apparent from the following detailed description of preferred embodiments, which is to be read in connection with the accompanying drawings.