The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
The use of hierarchical mark-up languages for structuring and describing data is finding wide acceptance in the computer industry. An example of a mark-up language is (extensible Mark-up Language) XML.
Data structured using a hierarchical mark-up language is composed of nodes. Nodes are usually delimited by a pair of corresponding start and end tags, which not only delimit the node, but also specify the name of the node. For example, in the following structured data fragment,                <A><B>5</B><D>10</D></A>        
the start tag <A> and the end tag </A> delimit a node having a name a.
The data between the corresponding tags is referred to as the node's content. A node's content can either be a scalar value (e.g. integer, text string), or one or more other nodes. A node that contains only a scalar value is referred to herein as a scalar node. A node that contains another node is referred to herein as a structured node. The contained nodes are referred to herein as descendant nodes.
In addition to containing one or more nodes, a structured node's content may also include a scalar value. Such content in a node is referred to herein as mixed content.
A structured node thus forms a hierarchy of nodes with multiple levels, the structured node being at the top level. A node at each level is linked to one or more nodes at a different level. Each node at a level below the top level is a child node of a parent node at the level above the child node. Nodes having the same parent are sibling nodes. A parent node may have multiple child nodes. A node that has no parent node linked to it is a root node, and a node that has no child nodes linked to it is a leaf node. For example, in structured node A, node A is the root node at the top level. Nodes B and D are descendant and child nodes of A, and, with respect to each other, nodes B and D are sibling nodes. Nodes B and D are also leaf nodes.
Schemas
A “hierarchical data object” is an arbitrary sequence of one or more structured nodes. Hierarchical data objects may be stored in various formats. For example, a hierarchical data object may be stored as a text file, or a hierarchical data object may be stored in an XML database in a Large Object (LOB) column of a row, or as a web page accessible as a resource on the Internet. A hierarchical data object is also referred to herein as a “data object”.
A schema constrains structure and content of data objects. Generally speaking, a schema is a set of rules for structure and constraints for units of data. The term schema is used herein both to refer to a single schema, that is, rules for a single type of unit of data, or to a collection of schemas, each defining a different type of unit of data. For example, the term schema may refer to multiple document schemas or to a single document schema, or a structure defined by document schema.
Schemas and the rules therein can be expressed using schema declarations. Schema declarations are expressions that, according to a schema standard and/or language, define a schema rule.
A schema standard used for XML documents is XML Schema. XML Schema provides for a type of schema referred to herein as a document-centralized schema. In a document-centralized schema, a document schema is defined by a schema declaration that expressly declares to be a document schema.
In a decentralized schema, a corpus of elements declares schemas for a collection of data objects and nodes in the collection. As the term is used herein, an “element” associates a name with a set of rules declared for the content of the nodes having that name. A node in a data object having a name of an element is referred to as an instance of the element.
This definition of “element” should not be confused with the definition conventionally ascribed to “element” by the XML community, which is that an element is a node in a document.
Validation
Validation refers to the process of determining whether a data object, or part thereof, conforms to a schema. Validating a data object requires a determination of what rules are needed to validate the data object. The set of rules needed to validate a data object is referred to herein as the schema rule set. The operation of determining the schema rule set is referred to herein as schema rules collection. Schema rules collection can be a computational complex task and, therefore, improving the efficiency with which this task is performed is important.
For document-centralized schemas, schema rules collection can be made more efficient by performing schema rules collection in advance of validating documents against the schema rules. For a given document schema, a schema rule set may be generated and subsequently applied to validate documents purporting to belong to that document schema.
However, it is not feasible to perform schema rules collection in advance for a decentralized schema. Unlike a document-centralized schema, a schema rule set for a data object of a decentralized schema can only be determined by examining the data object, for reasons discussed in the Validation Application. Even though data objects may be instances of the same element, the schema rule set needed to validate each may differ.
Based on the foregoing, there is a need for techniques and mechanisms for efficiently generating schema rules sets for data objects of decentralized schemas.