This invention relates generally to a system and method for storing documents in one format in a database having a different format and in particular to a system and method for storing and retrieving extensible Markup Language (XML) documents using a relational database.
The new extensible Markup Language (XML) protocol is poised to become the lingua franca of the Internet for capturing and electronically transmitting information. The advantage of XML, as compared to the older hypertext markup language protocol (HTML), is that it contains tags which render semantic significance to the information between the tags (e.g., the text between the tags is the last name of an author). In contrast, HTML tags are used primarily for specifying how the information is to be displayed in a browser (e.g., show the text between the tags in bold Arial font). Additionally, using known extensible Stylesheets (written in XSL), one may specify not only the format of how different XML elements are to be shown in a browser, but also the order in which they are to be displayed. These features of XML give a user much greater power and flexibility in searching for relevant information since a search may be performed using the tags that contain the semantic information. In addition, XML permits examining the information from different perspectives once it is found by the user.
To take full advantage of the possibilities that the XML protocol affords, it is desirable to devise an efficient means of storing, indexing and retrieving (via queries) XML documents. Typical RDMS, ODMS and flat files are slow and inefficient at storing XML documents. A preferred way of building Document Object Model (DOM) representations of the XML documents and then traversing the resulting trees to locate relevant nodes is only acceptable for small documents since memory becomes a limiting factor when the XML documents approach even moderate sizes. In addition, searches are not optimal since all searches must begin at the root of the document instead of at any node in the document. Moreover, it is not possible to search across a collection of documents (e.g. poems, novels, short stories and plays) for a particular character or the author.
At the same time, XML documents present unique challenges to storage in relational databases since their semi-structured nature often leads to a proliferation of tables when normalization is carried out. Given that relational database technology has seen great strides over the past couple of decades, it would be desirable and useful to provide a clean way of representing XML documents in relational terms. It is therefore the goal of the present invention to provide a system and method for the storage, indexing and retrieval of XML documents using relational databases.