1. Technical Field
Embodiments of the invention relate to validating Extensible Markup Language (XML) documents.
2. Prior Art
Hypertext Markup Language (HTML) is a markup language designed for creating web pages with hypertext and other information to be displayed in a web browser. XML is a metalanguage describing structure of data and is not a fixed set of elements like HTML. XML is a general-purpose specification for creating custom markup languages. XML is classified as an extensible language because XML allows users to define their own elements. XML facilitates the sharing of structured data across different information systems, particularly via the Internet. Further, XML is also used for encoding documents and serializing data. Over a period of time, the use of XML as a data exchange format has increased tremendously.
XML schema is a language or a model for describing a structure and constraining the contents of the XML document. The constraints defined for the XML documents follow the basic syntax constraints imposed by XML. An XML schema provides a view of an XML document at a relatively high level of abstraction.
There are languages developed specifically to express XML schemas. The Document Type Definition (DTD) language, which is native to the XML specification, is a schema language that is of relatively limited capability, but has other uses in XML aside from the expression of schemas. Another very popular and more expressive XML schema language is XML Schema standardized by World Wide Web Consortium (W3C). The mechanism for associating an XML document with an XML schema varies according to the schema language. The process of checking to find out if an XML document conforms to an XML schema is called validation. XML Documents are considered valid if the XML documents satisfy the requirements of the XML schema with which they have been associated.
A conventional XML document validation method is explained as follows. An XML document is received by an XML parser. The XML parser parses the XML document to generate Streaming Application Program Interface for XML (SAX) events. An XML schema validator subscribes to the SAX events from the XML parser and determines whether the XML document is in accordance with conditions specified in an XML schema. The XML schema validator uses the XML schema for validation of the XML document.
A format in which XML schema is organized or modeled affects runtime performance of the XML schema validator. Currently available techniques convert XML schema into complex data structures for example, Non-deterministic Finite Automata (NFA), and Deterministic Finite Automata (DFA). These complex data structures affect the runtime performance of the XML schema validator. Further, as the XML schema validator has to deal with the complex data structures there is an inefficient utilization of Central Processing Unit (CPU) time. Moreover, these complex data structures lead to inefficient utilization of memory. In case where the XML schema itself is complex and long, the complexity of the complex data structure worsens and it becomes difficult to read and maintain the complex data structure.
In light of the foregoing discussion, there is a need for an efficient organization of XML schema data structures for XML document validation.