This invention relates in general to computer-implemented database systems, and, in particular, to processing Extensible Markup Language (XML) documents.
The Internet is a collection of computer networks that exchange information via Hyper Text Transfer Protocol (HTTP). The Internet computer network consists of many internet networks. Currently, the use of the Internet computer network for commercial and non-commercial uses is exploding. Via its networks, the Internet computer network enables many users in different locations to access information stored in data sources (e.g., databases) stored in different locations.
The World Wide Web (i.e., the xe2x80x9cWWWxe2x80x9d or the xe2x80x9cWebxe2x80x9d) is a hypertext information and communication system used on the Internet computer network with data communications operating according to a client/server model. Typically, a Web client computer will request data stored in data sources from a Web server computer, at which Web server software resides. The Web server software interacts with an interface connected to, for example, a Database Management System (xe2x80x9cDBMSxe2x80x9d), which is connected to the data sources. These computer programs residing at the Web server computer will retrieve the data and transmit the data to the client computer. The data can be any type of information, including database data, static data, HTML data, or dynamically generated data.
With the fast growing popularity of the Internet and the World Wide Web (also known as xe2x80x9cWWWxe2x80x9d or the xe2x80x9cWebxe2x80x9d), there is also a fast growing demand for Web access to databases.
Databases are computerized information storage and retrieval systems. A Relational Database Management System (RDBMS) is a database management system (DBMS) which uses relational techniques for storing and retrieving data. Relational databases are organized into physical tables which consist of rows and columns of data. The rows are formally called tuples. A database will typically have many physical tables and each physical table will typically have multiple tuples and multiple columns. The physical tables are typically stored on random access storage devices (RASED) such as magnetic or optical disk drives for semi-permanent storage. Additionally, logical tables or xe2x80x9cviewsxe2x80x9d can be generated based on the physical tables and provide a particular way of looking at the database. A view arranges rows in some order, without affecting the physical organization of the database.
RDBMS software using a Structured Query Language (SQL) interface is well known in the art. The SQL interface has evolved into a standard language for RDBMS software and has been adopted as such by both the American National Standards Institute (ANSI) and the International Standards Organization (ISO).
The SQL interface allows users to formulate relational operations on the tables either interactively, in batch files, or embedded in host languages, such as C and COBOL. SQL allows the user to manipulate the data. The definitions for SQL provide that a RDBMS should respond to a particular query with a particular set of data given a specified database content, but the technique that the RDBMS uses to actually find the required information in the tables on the disk drives is left up to the RDBMS. Typically, there will be more than one technique that can be used by the RDBMS to access the required data. The RDBMS will optimize the technique used to find the data requested in a query in order to minimize the computer time used and, therefore, the cost of performing the query.
Additionally, an index is an ordered set of references to the records or rows in a database file or table. The index is used to access each record in the file using a key (i.e., one of the fields of the record or attributes of the row). When data is to be retrieved, an index is used to locate records. Then, the data is sorted into a user-specified order and returned to the user.
Extensible Markup Language (XML) is a new specification that is quickly gaining popularity for creating what are termed xe2x80x9cXML documentsxe2x80x9d. XML documents comprise structured data. XML documents are being shared between multiple businesses and between businesses and customers.
When XML documents are stored as column data, searching for desired XML data can be time-consuming. Typically, a search for XML data would require searching each XML document. This is usually called a document scan. Thus, there is a need in the art for an improved technique for searching for XML documents stored as column data.
With the longstanding use of relational databases, many businesses have stored their data in relational tables. In order to share this data with businesses that are using XML documents, the data in the relational databases may be manually selected, retrieved, and stored into XML documents. This is a long, tedious task. Thus, there is a need for an improved technique of selecting, retrieving, and storing relational data into XML documents.
In order to share relational data with other businesses that are using XML documents, a user may manually convert the relational data into XML documents. This is time consuming and inefficient. Thus, there is a need for an improved technique of generating XML documents from relational data.
Additionally, when an XML document is received, a user may need to store the data from the XML document into a relational database. Currently, this is a time consuming processing in which a user manually transfers the data from the XML document to the relational database. Thus, there is a need for an improved technique of decomposing an XML document and storing the decomposed data into a relational database.
To overcome the limitations in the prior art described above, and to overcome other limitations that will become apparent upon reading and understanding the present specification, the present invention discloses a method, apparatus, and article of manufacture for a computer implemented technique for processing XML documents.
In accordance with one aspect of the present invention, data is stored in a data store connected to a computer. A main table is created having a column for storing a document, wherein the document has one or more elements or attributes. One or more side tables are created, wherein each side table stores one or more elements or attributes. Then, the side tables are used to locate data in the main table.
In accordance with another aspect of the present invention, data stored on a data storage device that is connected to a computer is transformed. A query that selects data in the data storage device is received. The selected data is retrieved into a work space. Then, one or more XML documents are generated to consist of the selected data.
In accordance with yet another aspect of the present invention, data stored on a data storage device that is connected to a computer is transformed. Initially, a document object model tree is generated using a document access definition. The document object model tree is traversed to obtain information to retrieve relational data. The relational data is mapped to one or more XML documents.
In accordance with a further aspect of the present invention, data stored on a data store that is connected to a computer is transformed. Initially, an XML document containing XML data is received. A document access definition that identifies one or more relational tables and columns is received. The XML data is mapped from the application DTD to the relational tables and columns using the document access definition based on the XPath data model.