The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
The use of hierarchical mark-up languages for structuring and describing data is finding wide acceptance in the computer industry. An example of a mark-up language is XML.
Data structured using a hierarchical mark-up language is composed of nodes. Nodes are delimited by a pair of corresponding start and end tags, which not only delimit the node, but also specify the name of the node. For example, in the following structured data fragment, <A><B>5</B><D>10</D></A>
the start tag <A> and the end tag </A> delimit a node having name A.
The data between the corresponding tags is referred to as the node's content. A node's content can either be a scalar value (e.g. integer, text string), or one or more other nodes. A node that contains only a scalar value is referred to herein as a scalar node. A node that contains another node is referred to herein as a structured node. The contained nodes are referred to herein as descendant nodes.
In addition to containing one or more nodes, a structured node's content may also include a scalar value. Such content in a node is referred to herein as mixed content.
A structured node thus forms a hierarchy of nodes with multiple levels, the structured node being at the top level. A node at each level is linked to one or more nodes at a different level. Each node at a level below the top level is a child node of a parent node at the level above the child node. Nodes having the same parent are sibling nodes. A parent node may have multiple child nodes. A node that has no parent node linked to it is a root node, and a node that has no child nodes linked to it is a leaf node. For example, in structured node A, node A is the root node at the top level. Nodes B and D are descendant and child nodes of A, and with respect to each other, nodes B and D are sibling nodes. Nodes B and D are also leaf nodes.
Schemas
A document is an arbitrary sequence of one or more structured nodes. Documents may be stored in various formats. For example, a document may be stored as a text file, or a document may be stored in an XML database in a Large Object (LOB) column of a row, or as a web page accessible as a resource on the Internet.
It is very important to create documents that conform to structures and constraints that computing devices are configured to handle. A document schema is a set of rules that constrain structure and content of documents. A document that conforms to a document schema is referred to herein as a valid document and as an instance of the document schema.
Generally speaking, a schema is a set of rules for structure and constraints for units of data. The term schema is used herein both to refer to a single schema, i.e. rules for a single type of unit of data, or to a collection of schemas, each defining a different type of unit of data. For example, the term schema may refer to multiple document schemas or to a single document schema.
Schemas and the rules therein can be expressed using schema declarations. Schema declarations are expressions that, according to a schema standard and/or language, define a schema rule.
A schema standard used for XML documents is XML Schema. Standards governing XML schemas include: XML Schema, Part 0, Part 1, Part 2, W3C Recommendation, 2 May 2001, the contents of which are incorporated herein by reference; XML Schema Part 1: Structures, Second Edition, W3C Recommendation 28 Oct. 2004, the contents of which are incorporated herein by reference; XML Schema 1.1 Part 2: Datatypes, W3C Working Draft 17 Feb. 2006, the contents of which are incorporated herein by reference; and XML Schema Part 2: Datatypes Second Edition, W3C Recommendation 28 Oct. 2004, the contents of which incorporated herein by reference. XML Schemas as described in this document are not restricted to W3C XML Schemas but include any other mechanisms for describing the structural and/or typing information of XML documents, for example, Relax NG.
XML Schema provides for a type of schema referred to herein as a document-centralized schema. In a document-centralized schema, a document schema is defined by a schema declaration that expressly declares to be a document schema.
Validation refers to the process of determining whether a document, or part thereof, conforms to a schema. A document, or part thereof, that has been determined to conform to a document schema is referred to herein as validated. Generally, validation mechanisms that have been developed are adept at validating document-centralized schemas. However, not all forms of schemas are document-centralized, such as the decentralized form described in the XAP patent application. Described herein are techniques that may be used to facilitate the process of validating document schemas that are decentralized.
Based on the foregoing, there is need for techniques and mechanisms for efficiently validating documents according to a decentralized document schema.