1. Field of the Invention
This invention relates in general to database management systems performed by computers, and in particular to an optimized method and system for context-sensitive decomposition of markup based documents, such as XML documents, into a relational database, based on schemas with reusable element/attribute declarations.
2. Description of Related Art
Databases are computerized information storage and retrieval systems. A Relational Database Management System (RDBMS) is a database management system (DBMS) which uses relational techniques for storing and retrieving data. RDBMS software using a Structured Query Language (SQL) interface is well known in the art. The SQL interface has evolved into a standard language for RDBMS software and has been adopted as such by both the American National Standards Organization (ANSI) and the International Standards Organization (ISO).
Extensible Markup language (XML) is a standard data-formatting mechanism used for representing data on the Internet in a hierarchical data format and for information exchange. An XML document consists of nested element structures, starting with a root element. There are numerous conventional software products used for decomposing and transferring data contained in an XML document and storing it in a database.
Decomposition of an XML document is the process of breaking the document into component pieces and storing those pieces in a database. The specification of the pieces and where they are to be stored is accomplished by means of a mapping document. Mapping document may be in the form of a set of XML schema documents that describe the structure and data types used in conforming XML instance documents. XML schema documents are augmented with annotations that describe the mapping of XML components to tables/columns in a relational database. Annotations are a feature of XML schema that provide for application-specific information to be supplied to programs processing the schema.
In the context of decomposition, the key pieces of an XML document are elements and attributes. The corresponding XML schema describes the structure of the elements and attributes in the form of an element/attribute declaration. Annotations may be added to these declarations to specify the target table-column in which the content of an element/attribute, from an XML instance document, is to be stored. Presently known decomposition methods that utilize XML schemas are limited because, when provided with the same declaration for several items, used in multiple places in an XML schema, they have to map all the items into the same table-column pair and cannot store them in different destinations. The problem is best described by FIG. 1.
An exemplary user-defined XML schema having element declaration annotations mapping it to a relational database is shown in FIG. 1. Mapping annotations are indicated by the prefix “db2-xdb”, which is associated with the namespace for DB2's decomposition feature: http://www.ibm.com/xmlns/product/db2/xdb1. The element declaration of FIG. 1 shows that the components of <address> are mapped to columns “street”, “city”, “zipcode” of table “tabA.”
This example illustrates a limitation with this approach to mapping where the element <address> may be used in many contexts in an XML schema because its declaration in an XML schema is a global one, and other elements can contain <address> as a child element by referring to
<address> in their declarations:<xsd:element name=“hospital”> <xsd:complexType>  <xsd:sequence>   <xsd:element ref=“address”/>  ...</xsd:element><xsd:element name=“customer”> <xsd:complexType>  <xsd:sequence>   <xsd:element ref=“address”/>  ...</xsd:element>
In this example both hospital and customer have <address> in their declarations, and decomposition is performed regardless of the context. It is unlikely that an application would want hospital addresses to be decomposed into the same table as customer addresses. For the exemplary mapping of FIG. 1, context-sensitive decomposition is not possible for global element/attribute declarations that are used in multiple places in an XML schema.
Example of FIG. 1 illustrates one problem in decomposition, occurring for global element and attribute declarations. However, the problem also exists for element/attribute declarations that are part of named model groups, part of named attribute groups, or named complex types. Part of the problem lies in the W3 XML Schema recommendation's incomplete specification of the requirements, that a conformant schema processor must meet with respect to providing application access to annotations attached to element/attribute references. However, even if the recommendation were to be updated to address the accessibility of annotations on element/attribute references, the problem of providing context-sensitive decomposition for element/attribute declarations that are part of named model groups, part of named attribute groups or named complex types would still remain.
Global element and attribute declarations, named model groups, named attribute groups and named complex types are all reusable declarations. Presently, there are no solutions to the mentioned problems in context-sensitive decomposition of global element and attribute declarations, parts of named model groups, parts of named attribute groups or parts of named complex types.
While there have been various, including application-specific, techniques developed for decomposing and storing of markup based documents, such as XML documents, in a database, there is a need for a general method which will allow context-sensitive decomposition to a relational database, based on XML schemas with reusable element/attribute declarations, where the mapping document can be any user-defined XML schema and the mapping is user-controlled.