1. Technical Field
Embodiments of the invention relate generally to information processing and more particularly validating Extensible Markup Language (XML) documents.
2. Prior Art
Over the last few years, the use of XML as a data exchange format has increased tremendously. XML schema is a language or a model for describing the structure and constraining the contents of an XML document. The constraints defined for the XML documents follow the basic syntax constraints imposed by XML. An XML schema provides a view of an XML document at a relatively high level of abstraction.
There are languages developed specifically to express XML schemas. The Document Type Definition (DTD) language, which is native to the XML specification, is a schema language that is of relatively limited capability, but has other uses in XML aside from the expression of schemas. Another very popular and more expressive XML schema language is XML Schema standardized by World Wide Web Consortium (W3C). The mechanism for associating an XML document with an XML schema varies according to the schema language. The process of checking to find out if an XML document conforms to an XML schema is called validation. XML Documents are considered valid if the XML documents satisfy the requirements of the XML schema with which they have been associated.
In a typical XML document validation, there may be identity constraints which need to be identified and evaluated. Identity constraint definitions provide for uniqueness and reference constraints with respect to the contents of multiple elements and attributes. Identity constraints in XML schema are expressed using ‘unique’, ‘key’ and ‘keyref’ constructs. The construct ‘unique’ is used to specify that a particular element or attribute value, or a combination of one or more of these, is unique within the given scope of an element. The construct ‘key’ can serve the same purpose as ‘unique’. However, the ‘key’ construct in combination with ‘keyref’ construct allows one to specify referential integrity constraints. In other words, the ‘key’ construct is used to specify that the values of selected element or attribute are unique in a given scope and ‘keyref’ construct is used to specify that selected element or attribute value has a corresponding element or attribute with the same value in the subset identified by the ‘key’ construct. The ‘key’ and ‘keyref’ constructs are further related by having ‘keyref’ construct referring to the name of ‘key’ construct, which is unique in a given scope.
Usage of identity constraints is explained using the following example. Consider an XML document including list of customers and the orders placed by these customers. Every customer has a unique customer ID. Every order also has a unique order ID. In addition, an order also has the ID of a customer who placed the order. There may be multiple orders referring to the same customer ID. These constraints can be specified in the XML schema using XML schema identity constraints. The construct ‘unique’ or ‘key’ can be used to specify that customers and orders have unique IDs. Further, ‘keyref’ construct can be used to specify that every order refers to a valid customer identified by the ‘key’ construct. XML schema uses a subset of Xpath 1.0 language to express the elements or attributes referred in the ‘unique’, ‘key’ and ‘keyref’ constructs.
In a conventional approach for enforcing identity constraints, the XPath expressions are evaluated and XML nodes referred in identity constraints are identified at the runtime of an XML processing system. An XPath processor in the XML validator is needed to evaluate the XPath expressions at the runtime and further enforce the identity constraints. The Simple API for XML (SAX) events are fed into the XPath processor to validate the XPath expressions at runtime and subsequently enforce the identity constraints. Validating XPath expressions at runtime using an XPath processor and enforcing identity constraints significantly degrades the performance of the XML processing system which constitutes to inefficiency in terms of time and cost.
In light of the foregoing discussions, there is a need for efficient XPath evaluation in XML document validation.