Networks and networked applications have grown dramatically in number, size and complexity over the past decade. One of the more recent data encoding formats enjoying wide adoption, especially on the Internet, has been XML (EXtensible Markup Language), a part of the SGML family of document description languages. XML was developed as a document format protocol or language for the Web that is more flexible than HTML. XML allows tags used to define elements of a page or document to be flexibly defined by the developer of the page. Thus Web pages can be designed to effectively function like database records with selectively defined tags representing data items for specific applications (e.g. product code, department, price in the context of a purchase order or invoice document or page). In the world of Web content, the use of XML is growing as it becomes the preferred data format in both business-to-business (B2B), business-to-consumer (B2C), and peer-to-peer Web commerce sectors (e-business).
The World Wide Web Consortium (W3C) drives the standards for the various interoperable protocols that cover the features and extensions of XML. They include XSL (stylesheet language), XPath (a node selection and query module), XSLT (language for transforming XML), XQuery (a querying language) and XML Schema (create shared vocabularies). XML Schemas express shared vocabularies and provide a means for defining the structure, content and semantics of XML documents. The use of data schemas (such as DTDs, XML Schema, RelaxNG, etc.) allows a set format to be used for all similar transactions. A schema defines the type, order and layout of data for a particular XML format. When an XML schema is defined, then “instance documents” may be instantiated that conform to that particular schema.
An XML document may be validated against one or more XML Schema documents through a process called schema validation. Schema validation is the process of validating the data in the XML document against the structure defined in the XML Schema document. Schema validation requires identifying which element and attribute declarations and type definitions in the schemas should be used to check which elements and attributes in the instance document. If validation is being performed and a document fails validation, it can be rejected without further processing being performed.
In order to validate the document, the schema document must be obtained. The XML document typically will have one or more declarations to appropriate XML Schema documents in its header. A document may specify or hint its schema by using an XML processing instruction (PI), XML Namespace declaration, schema-location attribute or similar special declaration. For example, when using a PI, the value of the PI is the URI of the schema—which is commonly treated as a URL, its location on the web. The processor must therefore obtain these schema documents in order to validate the instance document. Further, these schema documents can include other schema documents, thereby requiring the processor to do multiple fetches on the web to validate a document.
A document's schema may also be specified “out of band,” such as through configuration information. This allows one to use a “trusted” copy of the XML Schema document. The trusted copy will typically be located at one specific web address or file location, and any processor wishing to validate that document must fetch the trusted copy of the XML Schema document. This results in higher security at the expense of additional manual configuration and possible lower runtime efficiency.