1. Field of the Invention
The invention relates to database systems. Specifically, the invention relates to methods for defining a metadata schema to facilitate passing data between an eXtensible Markup Language (XML) document and a hierarchical database.
2. Description of the Related Art
Today, business applications increasingly rely on XML documents to exchange data. Generally, modern software applications communicate with each other over the Internet using XML documents as a common data interchange language for Business to Business (B2B) and Business to Consumer (B2C) communications. Technologies such as webservers, servlets, web applications, web services, and the like generally rely on some fashion of data organized according to the eXtensible Markup Language Specification.
Typically, these same software applications then communicate the data in the XML document to database servers for storage in a database. Generally, before an XML document is stored in a database, the XML document is analyzed to ensure that the XML document is a “valid” XML document. An XML schema is used to validate an XML document. As used herein, references to “an XML document” mean that the XML document is a valid XML document according to a predefined XML schema. Because an XML document provides such flexibility in the organization and types of XML elements, XML documents are validated to ensure that they are organized as expected. An invalid XML document may lead to unpredictable or erroneous results in software modules using the invalid XML document.
An XML schema defines the structure, organization, and data types that are acceptable in all corresponding XML documents. The XML schema defines a set of XML elements, XML element attributes, and organization among the XML elements that is desired. The XML schema serves as a vocabulary for the XML elements. Consequently, the XML schema defines a superset of valid XML documents. The valid XML documents include one or more of the XML elements, XML attributes, and structure among the XML elements as defined in the XML schema.
Typically, prior to storing the XML document, the XML document is validated. Generally, two types of databases may store the data in the XML document, hierarchical or relational. Each type of database has different benefits and limitations, which will be discussed in more detail below.
Generally, the databases store the data of an XML document in two different formats. In one aspect, the raw data contained in the elements of the XML document are removed from the XML document and stored in the database. Data stored in this manner is referred to herein as “decomposed” data because the formatting of the XML document is removed to store only the raw data. In another aspect, the raw data including the formatting that comprises the XML document are stored in the database. When the XML document is stored in the database in this manner, this is referred to herein as storing the XML document “intact” because the formatting of the XML document or an XML sub-tree is preserved within the database.
To control costs, it is desirable that modern technologies such as XML documents be capable of readily interfacing with existing computer and information technology without significantly modifying the existing computer and information technology. For example, large corporations, governments, and other entities continue to use legacy applications, which are software programs designed, written, and maintained for large, mission-critical computers, such as mainframes. These entities have invested large amounts of work and money into developing and maintaining the legacy applications. In addition, these applications have been tested and refined to operate very efficiently and with minimal errors. Legacy applications continue to manage a high percentage of the everyday transactions and data for these businesses.
Similarly, many of these legacy applications continue to store and retrieve data using hierarchical databases, such as IBM's Information Management System (IMS), instead of common relational databases such as the Oracle database available from the Oracle corporation. To facilitate storing and retrieving data in XML documents (referred to herein as “XML data”), functionality for passing XML data between XML documents and relational databases has been developed. Generally, this functionality is integrated into the database servers for relational databases. Consequently, users' versions of the database servers must be updated to enable support for passing of data between an XML document and a relational database.
Unfortunately, no tools exist for passing XML documents and/or XML data between an XML document and a hierarchical DB, one example of which is IMS. Certain tools exist for passing XML data between an XML document and popular relational databases. These tools rely on schema information that relates the XML document to the relational database.
The schema information is stored in a proprietary format designed specifically to allow for passing of XML data between an XML document and the relational database. Often, a proprietary format is used because relational databases can vary considerably in how relationships are represented. Proprietary schema information requires that developers and users passing XML data to and from relational databases learn a new syntax and semantics. This learning curve may be steep. Often, the proprietary schema information is stored in binary format requiring special editors to create and modify the proprietary schema information.
In addition, the proprietary schema information generally must be revised or constantly modified in order to ensure that the proprietary schema information handles passing all the various kinds of desired valid XML documents. The XML language is a very flexible language which allows definition of unique XML elements to meet a particular need. The proprietary schema information must be specifically updated to handle cases of XML documents that include the unique XML elements. Consequently, the proprietary schema information is either constantly updated or once defined severely limits the set of XML documents that may be passed into and out of the relational database.
As mentioned, even proprietary schema information is not currently available that will enable passing XML data between XML documents and a hierarchical database. In addition, hierarchical databases such as IMS may include user-defined database views. These views may control not only what part of the database a user or user application may access, but also security and authentication features for protecting the data. Examples of such views and user-defined access features include the Program Status Block (PSB) and Program Control Block (PCB) provided by IMS. In order to properly store and retrieve XML data for the hierarchical database, the user-defined database view(s) needs to be identified. These views may be changed as needed by a database administrator. Proprietary schema information does not currently account for user-defined database views for either hierarchical or relational databases.
Accordingly, a need exists for a method for defining a metadata schema to facilitate passing data between an XML document and a hierarchical database. The method should define a metadata schema that complies with an accepted, text-based, industry standard such that the learning curve is minimized. In addition, the method should define a metadata schema that serves a dual purpose, one to define a set of valid, well-formed XML documents that may be passed into and out of a hierarchical database, and two to facilitate mapping data from a source XML document into and out of a particular hierarchical database. Further the method should define a metadata schema that accommodates user-defined database views of the hierarchical database and is readily modifiable to adjust to changes in the user-defined database views.