The rapid increase in Internet usage has ushered in a boom in e-business activities around the globe. Every day, numerous organizations create hundreds of thousands of web pages touting their services and products. Further, an e-marketplace has rapidly emerged, where transactions between different organizations and between the individual customer and a collection of business partners are taking place seamlessly.
Those developments are facilitated by the power of the Web, which in turn is made possible by the use of eXchange Markup Language (XML). XML is being used as the standard mode of document exchange. The popularization of that standard promotes the integration process and communication between organizations. Furthermore, the inherent structural richness that is the hallmark of the language has helped with the in-house document management process.
However, to be able to fully exploit the advantages from using XML, one must be able to profitably archive and search such documents, and to search in a manner that takes advantage of the structured nature of such documents. That is especially true in the case of e-business applications where different products might have to be searched based on their different characteristics or based on their hierarchical position as is frequently the case in spare parts.
Relational databases are highly efficient for the archiving and querying of data that can be tabularized; i.e., organized as rows and columns. XML, however, represents data with a hierarchical structure, and might or might not follow a document type definition (DTD) or a schema. The depth of the hierarchy can be irregular and unpredictable. That significant difference requires different approaches to store, index, and retrieve XML data. There is therefore a need for a search mechanism that can handle relational databases as well as well-formed XML document collections.
The effective archival of XML data also requires a good methodology for indexing that data. Any indexing scheme must be flexible and adaptive. For example, if there is a likelihood of a certain class of query being repeated more often than others, the indexing scheme should adjust to that.
Except for trivial situations, retrieval efficiency is directly related to how good the indexing of the data is. While there are several efficient indexing schemes for tabularized data, such methods cannot directly be mapped to use for XML document collections, because of the additional structural information contained in XML data.
Further, an indexing scheme is more than simply an index; rather, the scheme should be efficient for addressing the queries against the database. To make the indexing scheme efficient, it is preferable that the index itself change based on the type of query. Thus, if a type of query is repeated often, the index should respond to that.
There is therefore presently a need to provide methods and systems for querying large data archives containing XML files. Particularly, there is a need for a technique for storing, indexing and retrieving XML data, given the uneven hierarchy depths and other unique aspects of data stored in that manner. To the inventors' knowledge, no such techniques are currently available.