1. Technical Field
The invention relates generally to XML schema evolution technique. More particularly, the invention relates to an apparatus and a method for providing schema manipulation operations and validating schema changes.
2. Description of the Prior Art
XML (Extensible Markup Language), developed by the World Wide Web Consortium (W3C), is a system for organizing and tagging elements of a document. It allows designers to create their own customized tags, enabling the definition, transmission, validation, and interpretation of data between applications and between organizations. It is a flexible way to create common information formats and share both the format and the data on the World Wide Web, intranets, and other networks. For example, computer makers might agree on a standard way to describe information, such as processor speed, memory size, and so forth, about a computer product and then describe the product information format with XML. Such a standard way of describing data enables a user to send an intelligent agent to each computer maker's Web site, gather data, and then make a valid comparison. XML can be used by any individual or group of individuals or company that wants to share information in a consistent way.
XML elements and attributes can be identified and accessed with XPath expressions. XPath is a language that describes a way to locate and process items in XML documents by using an addressing syntax based on a path through the document's logical structure or hierarchy. This makes writing programming expressions easier than if each expression had to understand typical XML markup and its sequence in a document. XPath also allows the programmer to deal with the document at a higher level of abstraction. It uses the information abstraction defined in the XML Information Set.
XPath uses the concepts of node, i.e. the point from which the path address begins), the logical tree that is inherent in any XML document, and the concepts expressing logical relationships that are defined in the XML Information Set, such as ancestor, attribute, child, parent, and self. XPath includes a small set of expressions for specifying mathematics functions and the ability to be extended with other functions.
The XML language itself does not limit set of tags for element and attribute names. Due to lack of a definite set of element and attribute names and lack of structure definition, confusion may arise when two different party communicate via XML documents. This has lead to the provision of many schema definition languages, one of which is the XML Schema that specifies how to describe the elements in XML document formally. This description can be used to verify that each item of content in a document adheres to the description of the element in which the content is to be placed.
In general, a schema is an abstract representation of an object's characteristics and relationship to other objects. An XML schema represents the interrelationship between the attributes and elements of an XML object, for example, a document or a portion of a document. To create a schema for a document, its structure must be analyzed and each structural element must be defined. XML Schema has several advantages over earlier XML schema languages, such as Document Type Definition (DTD). For example, it is more direct: XML Schema, in contrast to the earlier languages, is written in XML, which means that it does not require intermediary processing by a parser. Other benefits include self-documentation, automatic schema creation, and the ability to be queried through XML Transformations (XSLT).
For an XML schema to endure over time it must be capable of evolving to reflect the changing information requirements. A set of operations, such as, Insert, Delete, Update, Query has been proposed for manipulating XML documents. However, no mechanisms have been defined for manipulating XML schemas.
To allow XML document to contain extended data, XML schemas could have various data types with <xsd:any> as its subcomponents. <xsd:any> are served as place holders for any extended data because an any type does not constrain its content in any way. An extremely extensive XML schema is illustrated as follows:
<xs:element name=“myData”><xs:complexType><xs:sequence><xs:any processContents=“skip” minOccurs=“0”maxOccurs=“unbounded”/></xs:sequence></xs:complexType></xs:element>
Although this approach does allow extended data to be contained in XML documents of the schema, it does not provide any control of the extended data.
What is desired is a technique for performing schema manipulation operations so that an XML schema can be evolved in a controlled, pragmatic way. Because there might be lots of XML documents, e.g. thousands under an existing XML Schema, XML Schema must evolve in such a way that ensures all existing XML documents remain valid under the new XML schema that results from such schema manipulations.
What is further desired is a technique to determine whether all XML documents are still valid after schema manipulation without individually examining these XML documents. It is time consuming to examine thousands XML documents. In certain applications, for example Web Services that use XML to represent user data logically in a distributed set of computers, it is substantially impossible to examine XML documents individually.