The present invention relates to computer document processing technology, and particularly relates to a method and system for validating XML documents.
Standard Generalized Markup Language (SGML) is an information management standard adopted by the International Organization for Standardization (ISO) for providing platform- and application software independent documents. In a document, format, index and links are maintained. SGML provides a mechanism similar to a grammar for defining document structure and tags. The tags are used for representing format of different documents.
Extensible Markup Language (XML) is a standard language suggested by W3C. It is a condensed format of SGML. XML provides more flexibility for web developers and designers to create customized marks, organize and represent information. XML is used to exchange documents and data for Service-Oriented Architecture (SOA) and Web services. One of the advantages of XML as a format for data exchange is the standardization of validation technology.
Many XML application developers expect that there is a method to guarantee that all XML instances comply with some specific rules, such as guaranteed validation processing. Thus, many people directly seek help from schema language, e.g. DTD, W3C XML Schema (WXS) and RELAX NG. The effort may be performed by applying some rules to XML instances.
Usually, validation technology includes a grammar based validation method. As an alternative, Schematron is a structural validation language. Schematron allows to directly express rules without the need to create a whole grammar base. Tree patterns, defined as XPath expressions, are used to make assertions, and provide user-centric reports about XML documents. Expressing validation rules using patterns is often easier than defining the same rule using a content model. Tree patterns are collected together to form a Schematron schema. Schematron is a useful tool for other schema languages. Schematron is a useful tool to apply rules for an XML document or validate with rules. Schematron is flexible, and may be used to express different rules. Its expressing capability may be more suitable than other schema languages (e.g., DTD, W3C XML Schema (WXS), and RELAX NG).
Efforts were made by the industry and academia to implement Schematron. FIG. 1 shows a widely used and referenced Schematron implementation method. One may refer to http://www.schematron.com, which is a typical implementation method with open source software, and is frequently used by various projects. Schematron uses XML Stylesheet Language For Transformation (XSLT), and defines a schema language which, when transformed through a meta-style sheet (i.e. a style sheet which generates other style sheets), produces XSLT validation document. FIG. 1 shows the process.
The web site http://www.ldodds.com/papers/schematron_xsltuk.html also introduced Schematron and other implementations based on XSLT.
Furthermore, the Community-driven Systems Management in Open Source (COSMOS) project aims to provide inter-operable tools for system management. The COSMOS Resource Modeling sub-project aims to provide support for building a common model to represent the information being shared in a system management scenario. The project is using SML and Schematron as the XML schema language to define this common model. It uses the XSLT based approach and skeleton1-5.xsl (http://xml.ascc.net/schematron/1.5/) reference implementation to extract the Schematron from the schema.