Data warehouses as well as other data repositories are used to integrate data across an enterprise. Such data may periodically be accessed/characterized through queries of the data warehouse. In some arrangements, the results of a query may be calculated at two or more different times and/or using two or more different query techniques. In order to determine whether there are any deviations in the query results, a comparison among the calculated results may be performed.
In some cases, a query may have been processed in which the corresponding result table has been proven or verified to be accurate. Subsequently, the same query is executed in order to determine if recent code changes or alternative query execution procedures still deliver the correct result table. In other cases, a query may be simultaneously executed on two systems and the results may be subsequently compared in order to guarantee accuracy. Such simultaneous query execution may be implemented, for example, in N-version software (NVS), for fault tolerance purposes in mission-critical tasks.
FIG. 3 illustrates an example of a query result that consists of a table 300 of m columns and n rows. In total, there are m*n values v(i,j) with 1≦i≦n, 1≦j≦m (i.e., v(i,j)=value to be found row i column j, e.g., v(2, 3)=35078.00). FIG. 4 shows a second example of a query result table 400 similar to that illustrated in the table 300 of FIG. 3 but having minor differences (indicated in bold and in underline).
Conventional techniques for verifying that two query results (e.g., result tables 300, 400 from FIG. 3 and FIG. 4) are similar and/or identical, typically (i) check that the number of results rows (or cells) are identical, and if that is the case, (ii) conduct a cell-wise comparison of the results. However, the amount of time to process a n*m cell comparison operation may be lengthy (in the worst case, i.e. time complexity is O(n*m)). Moreover, such an operation may also consume significant memory/storage resources (i.e., the space required in memory or on disk is 2*n*m (i.e., space complexity is also O(n*m)).
Furthermore, in some configurations, the cell-wise comparison may require the result tables to be sorted in advance. As sorting has a time complexity of O(n*log n), such an operation also greatly increases the amount of time required to verify the query results. Alternatively, simple search techniques requiring even more time and consuming more resources may be employed to selectively retrieve the value of certain cells for comparison purposes.