The eXtensible Markup Language (XML) is a World Wide Web Consortium (W3C®) endorsed standard (reference the www website w3c.org/tr) for document formatting that provides a generic syntax to mark up data with human-readable tags. Although XML may easily describe the contents of a document in a well-defined format, there are other sources of data that may not so easily be described either because their structure is inconsistent with that of a standard text document or because of some other non-XML compatible characteristic. An example of such a data source may be a spreadsheet or a relational database.
The challenge of performing an XML-like search over data sources having diverse data programming models is termed virtual XML. The term is generally interpreted as including querying over virtual XML views. Virtual XML is a concept which establishes consistency across data access programming models and allows users to work with their data in the way they think about it instead of the actual storage format. The concept of querying over virtual XML data involves treating the data as if it were XML without ever really ever converting it to XML. One advantage in this concept is that the overhead of XML encoding is kept to a minimum. It is desirable if the virtual XML scenario had the advantage that it be able to utilize a query language to query over a non-XML data source as if the data source were XML query. It is also desired that the mapping between the actual data and the virtual XML representation be of high fidelity.
There are numerous challenges inherent to implementing a virtual XML. One problem is efficiency. One could simply expose a data source with a virtual XML interface, such as for example, an XML Reader, and then query over it with the existing XML query implementations such as for example, an XML Document Object Model (DOM). However, all of the work occurs in the XML query engine instead of being performed by the data source itself. The data source itself, and its associated data management system, is assumed to be more efficient in conducting a query of its data in its specifically designed language than a foreign query system having a different data model.
This aspect introduces another problem in implementing a virtual XML query; the XML data model does not always align well with the underlying data model and its type system. One could map all of the types of the underlying data source into XML types, but this process loses fidelity and is inefficient also. Furthermore, types in one system may have no obvious equivalent in another. For example, representing binary data such as images in XML requires a costly conversion to the XML character set (e.g., base64-encoding).
Prior attempts to query over virtual XML approached the problem by constructing two different data structures; one for the query and one for the mapping, and then traversing them in tandem to generate an efficient query directly over the original data sources, without ever materializing the virtual XML view. Although this approach initially works well, development becomes enormously difficult as the query and mapping languages increase in complexity. Concepts in the query or mapping often do not translate directly into the target data model, and composing complex queries with complex XML views requires an abundance of semantic analysis and rewrites.
Additionally, a system architecture which can support the transformation of queries in one language to either query representations or query results over many data sources normally requires the costly implementation of M times N paths, where M is the number of input options and N is the number of output options. Such transformation compilers can become large in number using standard architectures.
Thus there is a need for a unifying representation and a single system architecture to implement virtual XML for XML queries and views over XML and non-XML data sources. The present invention addresses the aforementioned needs and solves them with both an architecture utilizing a unifying representation and an application programming interface for users of the inventive system.