1. Technical Field
This invention generally relates to unique particle attribution (UPA). Specifically, this invention relates to validating UPA constraints in extensible markup language (XML) schemas.
2. Description of Background
XML is a general-purpose markup language classified as an extensible language because it allows its users to define their own tags. One function of XML is to facilitate the sharing of data across different information systems, particularly via the Internet. An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntax constraints imposed by XML itself. An XML schema provides a view of the document type at a relatively high level of abstraction.
There are languages developed specifically to express XML schemas. The Document Type Definition (DTD) language, which is native to the XML specification, is a schema language that is of relatively limited capability, but that also has other uses in XML aside from the expression of schemas. Two other very popular, more expressive XML schema languages are W3C XML Schema Definition Language and RELAX NG. However, for the purposes of this description, the term “schema” refers to an XML Schema instance of the W3C XML Schema Definition Language.
The process of checking to see if an XML document conforms to a schema is called validation, which is separate from XML's core concept of syntactic well-formedness. All XML documents must be well-formed, but it is not required that a document be valid unless the XML parser is actually validating, in which case the document is also checked for conformance with its associated schema.
Documents are only considered valid if they satisfy the requirements of the schema with which they have been associated. The unique particle attribution (UPA) rule is XML schema's mechanism to prevent schema ambiguity. For a schema content model to be valid with respect to UPA, it must be possible to attribute particles to a sequence of element information items unambiguously without looking ahead.
However, validating UPA constraints is difficult to implement. Conventionally proposed solutions, including expanding numeric exponents of content models into finite state automata, may exhibit exponential and/or erratic behavior, resulting in intractability of these proposed solutions.