Data stored in computer systems may be used for many different purposes. Thus, different storage formats and indexing/retrieval systems for data have been developed in response to these different purposes or needs that users may have for data. These different storage and operation systems are known as data domains. For example, a relational database is a type of data domain. The relational database stores data and provides access to the data, for example via a database query language.
While a data domain may be useful for many purposes, it may also be the case that data stored in a data domain may be useful in a different data domain. For example, as discussed above, a relational database is a type of data domain. Another data domain is the object-oriented domain, in which data exists as objects that have both data and behavioral characteristics. Object-oriented technology supports the building of applications out of objects that have both data and behavior. Thus, the object-oriented domain is another, different data domain. Yet it is often useful for an application in the object-oriented domain to use data from the relational domain of the database. This situation and the resulting problems have been termed the “object-relational impedance mismatch.”
Similarly, extensible markup language (XML) is a language used to create documents which are processed by an XML processor. XML is used increasingly in facilitating data exchange and a multiplicity of other uses, over the World Wide Web and elsewhere. However, XML documents containing data and other facets of the XML data domain are not directly compatible with, e.g., the relational domain.
In order to address these data domain mismatches, a number of systems have been developed which bridge multiple data domains. These are called cross domain data systems. Object relational systems are cross domain data systems which bridge the object-oriented data domain and the relational database data domain. XML Database are cross domain data systems which bridge the XML data domain and the relational database data domain.
Cross domain data systems allow data in one data domain to be accessed from another data domain. In cross domain data systems, the domain which serves to actually store data is called the source domain. The domain in which data is retrieved and manipulated is called the target domain. Systems which bridge source and target domains typically offer a query language for retrieving data stored in the source domain in a manner compatible with the target domain.
While source domains may have a query language of their own for retrieving data, cross domain data systems in many cases expose a separate query language of their own. The provision of a cross domain data system query language occurs for a number of reasons. The cross domain data system query language hides the source data domain system. It hides any complexities involved in accessing data from the source system. And it, in many cases, is closer to and/or consistent with the semantics of the target domain as opposed to the source domain.
In creating a cross domain data system, one difficulty is in making sure the system functions correctly. Verifying the correctness of results of queries executed on the cross domain data system is important. One way to do this verification is by manual (human) checking. This requires extensive resource use and is prone to human error. Prior art automatic verification methods have drawbacks as well.
File baseline verification is one traditional automatic verification method for cross domain data systems. For file baseline verification, a baseline file is created containing the results which should be retrieved by a cross domain data system for a particular query. The baseline file is created or checked by a human checker. The cross domain data system then runs the query to obtain results. The baseline file is then used in order to verify results obtained by the cross domain data system. This verification method has obvious drawbacks: first, human checkers may introduce errors into the process. Also, in addition to the cost of a human checker, this verification does not scale well. Larger data sets will require larger amounts of time taken to produce baseline files.
Another verification method involves running a query on the cross domain data system and an equivalent query on the source data system. Though concise and scalable, this approach is limited due to potential structural mismatches of data between source and target domains. The comparison may not be straightforward, for reasons similar to the reasons that a cross domain data system is useful; because of the differences in data domains, it can be difficult to translate a query in the query language of the cross domain data system (which, as discussed, is often closer and consistent with the semantics of the target domain) into a query on the source domain. Thus, manual translation of the cross domain query language queries is required. This introduces a human checker, which introduces error and resource considerations.
Thus, there is a need for a technique to allow the verification of cross domain data system queries, which is scalable and automated.