The eXtensible Markup Language (XML) is a World Wide Web Consortium (W3C) endorsed standard for document formatting that provides a generic syntax to mark up data with human-readable tags. XML does not have a fixed set of tags and elements and thus allows users to define such tags as long as they conform to the XML standard. Data may be stored in XML documents as strings of text that are surrounded by text markup. The W3C has codified XML's abstract data model in a specification called the XML information set (XML Infoset). The Infoset describes the logical structure of an XML document in terms of property-containing nodes. Although XML may easily describe the contents of a document in a well-defined format, there are other sources of data that may not so easily be described either because their structure is inconsistent with that of a standard text document or because of some other non-XML compatible characteristic. An example of such a data source may be a spreadsheet or a relational database.
Virtual XML is a concept which establishes consistency across diverse data access programming models and allows users to work with their data in the way they think about it instead of the actual storage format. The concept of querying over virtual XML data involves treating the data as if it were XML without ever really ever converting it to XML. One advantage in this concept is that the overhead of XML encoding is kept to a minimum. It is desirable if the virtual XML scenario had the advantage that it be able to utilize a query language to query over a non-XML data source as if the data source were XML. It is also desired that the mapping between the actual data and the virtual XML representation be of high fidelity.
There are numerous challenges inherent to implementing a virtual XML. One problem is efficiency. One could simply expose a data source with a virtual XML interface, such as XMLReader™, and then query over it with the existing XML query implementations such as XPathNavigator™. However, all of the work occurs in the XML query engine instead of being performed by the data source itself.
By way of example, consider the following virtual XML query over a structured query language (SQL) embodied in SQLServer:    sql: database (“Northwind”/Customer[@ID=‘ALKFI’]/Order In this query, sql:database(“Northwind”) exposes the Northwind database of SQLServer as virtual XML, and then the XPath/Customer[@ID=‘ALKFI’]/Order selects all the Order elements from one of the Customer elements. An implementation might attempt something like the following:    SQLServerMapping map=new SQLServerMapping(“Northwind”);    SQLServerXmlReader Data=new SQLServerXmlReader(map);    XPathNavigator nav=new XpathNavigator(data,”/Customer[@ID=‘ALKFI’]/Order;            There are at least two flaws in this approach. First, the entire mapping is performed by the XmlReader even though only a part of it is used by the query. A second flaw is that SQLServer can vastly more efficiently select Customers by ID than XPathNavigator can. Note that the example above has XPathNavigator performing all of the work. A better solution to this challenge may be to offload as much of the query into the data source (here, a SQLServer database) as possible. However, this may involve significant query analysis and rewriting.        
Another problem in implementing a virtual XML is that the XML data model does not always align well with the underlying data model and its type system. One could map all of the types of the underlying data source into XML types, but this process loses fidelity and is inefficient also. Furthermore, types in one system may have no obvious equivalent in another. For example, representing binary data such as images in XML requires a costly conversion to the XML character set (e.g., base64-encoding).
Prior attempts to query over virtual XML approached the problem by constructing two different data structures; one for the query and one for the mapping, and then traversing them in tandem to generate an efficient query directly over the original data sources, without ever materializing the virtual XML view. Although this approach initially works well, development becomes enormously difficult as the query and mapping languages increase in complexity. Concepts in the query or mapping often do not translate directly into the target data model, and composing complex queries with complex views requires an abundance of semantic analysis and rewrites.
Thus there is a need for a unifying representation to implement virtual XML for XML queries and views over XML and non-XML data sources. Methods and systems of implementing queries over complex mappings into less complex problems of composing queries and performing them over less complex mappings is desired. The present invention addresses the aforementioned needs and drawbacks and solves them with additional advantages as expressed herein.