The present invention relates generally to pattern matching between sequences of transactions, and more particularly to multi-character elements in which the elements are interleaved differently in each sequence.
Databases are routinely upgraded to new versions, or new software patches are applied on existing versions, or the database is migrated to a new database system. In each of these situations, it is typical to compare the performance of a benchmark transaction workload in the new database environment as compared to the old database environment. A benchmark transaction workload is a sequence of different transaction types. In a typical database environment, each transaction can be a sequence of one or more Structured Query Language (SQL) statements. To compare the performances of the benchmark transaction workloads, corresponding instances of transactions in the new and old database environments are matched. Matching and comparing corresponding transactions is necessary because there will typically be workloads occurring in the database environments that are extraneous to the benchmark workloads, and performance of the same transaction type varies during the workload execution since the underlying data is continuously modulated. The benchmark transaction workload comparison typically involves comparing the transaction logs of the old and new database environments.
A simple form of matching involves identifying the occurrences of a short character string inside a longer character string. A simple approach to this problem is advancing the shorter string through the longer string one character at a time, and determining if there is a match between the shorter string and the corresponding characters of the longer string. If there is not a match, the shorter string is advanced by one character in the longer string.
Another well-known matching problem involves sequence alignment. This problem is perhaps best known in the context of DNA matching. A full DNA sequence can be characterized as a series of shorter four-character sequences, where each character of the shorter sequence can be one of A, C, G, or T, representing different nucleotide bases. Various techniques have been developed to determine the alignment of two DNA sequences that will give the highest level of correspondence between the two sequences.
Neither of these known matching solutions address matching transactions in the more complicated environment usually found associated with databases.