The extensible Markup Language (XML) is a World Wide Web Consortium (W3C) endorsed standard for document and data representation that provides a generic syntax to mark up data with human-readable tags. XML does not have a fixed set of tags and thus allows users to define such tags as long as they conform to the XML standard. Data may be stored in XML documents as strings of text that are surrounded by text markup.
As XML's usage has grown, it has become generally accepted that XML is not only useful for describing new document formats for the Web but is also suitable for describing structured data. Examples of structured data include information which is typically contained in spreadsheets, program configuration files, and network protocols. XML is preferable to previous data formats because XML can easily represent both tabular data, such as relational data from a database or spreadsheet, and semi-structured data, such as a web page or business document. Therefore the XML language may be used to format any kind of data; not just textual data. Also, XML documents may have other XML documents embedded in them forming compound XML documents. Compound XML documents may have the embedded documents expressed as encoded documents which may contain many different types of data. The data in each of these embedded documents may be encoded differently. Examples of this include embedded documents that are encoded as HTML or Base64 encoded documents. Other encoding mechanisms are possible.
There are many languages that allow queries on XML documents such as XPath, XSLT, and XQuery. A navigation model of these languages allows reaching XML elements, such as tags, and their values within a target XML document by specifying a path consisting of XML names of tags or nodes in the target document. While this method proved to be very powerful on simple XML documents, there is a category of compound XML documents where the standard navigation model does not allow retrieving values from the nested encoded documents that are embedded into the primary XML document. Such compound XML documents usually occur when pieces of XML are stored as attribute values in the primary document or when the primary document represents a dataset retrieved from a database where some of the table columns contain XML documents that may be encoded.
Using standard XML query languages allows getting nested documents as a single large text string, but it does not allow querying their contents as a part of the same query. Querying the nested documents require generation of a second, third, or more queries depending on the level of nesting. It is desirable to query compound XML documents using fewer queries.