Extensible Markup Language (XML) was first designed as a complete, platform-independent and system-independent environment for delivering and authoring of information resources over the World Wide Web (hereinafter, “Web”). XML was intended to supplement and in some cases replace Hypertext Markup Language (HTML), which had been the prevalent method of authoring and referencing content over the Web.
XML is a set of technologies that define a universal data format for tree-based, hierarchically-formed information. A number of specifications extending its range and power, such as Extensible Stylesheet Language (XSL), Document Object Model (DOM), and XSL Transformations (XSLT), are being developed. XML offers the advantages of platform independence and Web awareness, and many XML tools are open source and freely available. Thus, XML technologies can provide a simple and low cost solution for enterprise-wide access to information.
Because XML is used to describe information as well as structure, it is particularly well suited as a data description language. One of XML's particular strengths is that it allows entire industries, academic disciplines, and professional organizations to standardize the representation of information within those disciplines. In particular, communicating parties need to agree on an XML dialect for their particular business domain and needs. This dialect is usually defined in a Document Type Definition (DTD) or XML Schema document, which defines the syntax and data types to which all of its instance XML documents must conform. The data source will generate XML data according to their DTD or Schema definition. The data consumer system can use an XML validating parser to verify the syntax of the incoming data before passing it to its data processing system.
While syntax validation is important in preventing erroneous data from disrupting the data consumer system, it cannot verify the equally important non-structural semantic constraints on XML data. In reality, the value or presence of an element may depend on the value or presence of another element; and the value scope of an element may vary for different document instances and be decided by system environment. A grammatically validated XML document does not guarantee itself to be meaningful. Even though XML Schema is much more powerful than DTD, it cannot be used to specify non-structural constraints. There is a need for an extensible, expressive, platform-neutral, and domain-independent way of specifying semantic constraints on XML documents.
Another challenge for data integration is the specification of complex constraints on business data models. While in theory a text editor can be used to specify such constraints in a particular constraint specification language, the complexities of real-world business data structures could make such constraint specifications cryptic and error-prone. Ideally, such constraints could be specified at a more abstract data model level so the human users can visually help verify the constraints.
Another challenge relates to constraint validation. XML validating parsers cannot use the constraint documents to validate non-structural constraints. Hard coding such constraints into a program is not attractive, since such a program may not truthfully implement the constraints, is not flexible for system modifications or extensions, and cannot be reused. Mature XML technologies should be used to provide a generic framework for automatic constraint validation.
To address these limitations, extensions to XML Schema have been developed to express complex constraints for validation. There are at least three options for extending XML schema limitation to express complex constraints. The first option uses additional schema languages. Drawbacks with this approach include the fact that it does not solve all the complex constraint cases, some of which can only be expressed with difficulty by a schema language. Also, each schema language has its own capabilities and limitations, so multiple schema languages may be required to express all the additional constraints. Further, there is the burden on users who must learn each of the additional schema languages. Finally, there is uncertainty about long term support for the schema languages, particularly if they were created by a single author, who may not be counted on for continuing support.
A second option is using XSLT/XPath stylesheet to express additional constraints. One drawback with this option is that it does not solve all the complex constraint cases, some of which can only be expressed with difficulty by XSLT/XPath stylesheet. Also, performance may be an issue with multiple constraints in XSLT/XPath stylesheet.
A third option is using a programming language (Java, C++) to express additional constraints. Problems with this option arise because this is a tightly coupled programming model and may be difficult to change at deployment time and runtime. Also, with this option, the constraints cannot be expressed in a declarative manner.
Accordingly, there is a need for systems and methods that can express complex constraints for XML schema. Also, there is a need for a solution to the above-discussed problems that utilizes with the full power of a programming language and which also has the capability to express complex constraints in a declarative manner.