This application claims the priority of Korean Patent Application No. 2002-25398, filed on May 8, 2002, in the Korean Intellectual Property Office, the disclosure of which is incorporated herein in its entirety by reference.
1. Field of the Invention
The present invention relates to a path index lookup method, which is an extended markup language (XML) indexing method by which a relational database can stably process a query regardless of how a user expresses the query, and more particularly, to a method for processing a regular path expression with a certain length using a PathLookup table and an ExtendedEdge table using a single joining operation.
2. Description of the Related Art
The extended markup language (XML), which has been proposed as a standard language for information exchange on the Internet, is being widely used in computers and network systems (SyncML, UpnP) and also in many other industrial fields, such as, biological information (BSML, BioML), electronic commerce (ebXML, ECML), electronic data interchange (XML-EDI), geographic information and global positioning systems (GPS) (GML, NVML), multimedia (MPEG-7, IML), entertainment (MusicXML, GML), education (LMML, TML), medical care (CTDM, TDL), publication (BiblioML, DocBook), and TV broadcasting (TV-Anytime), etc.
Data written in XML is different from typical data in an existing database like a relational database or an object-oriented database in many ways; for example, XML is semi-structured. In other words, XML provides a document type definition (DTD) adequate for an application field, but has a semi-structured characteristic in that XML does not have to strictly follow the DTD.
Due to the semi-structured characteristic of XML, data expression and interchange between different data sources are flexible. XML is used as a standard language in many application fields because of the semi-structured characteristic as well as because it involves easy data expression.
Because of the semi-structured characteristic of XML, a programmer of an XML document is allowed to transform XML data deviating from a DTD in order to produce an XML document. Also, a user of the XML document can search for data without exactly knowing the structure of XML data.
A data search based on the semi-structured characteristic of an XML can be effectively used when a user does not exactly know the structure of an XML document including data to be searched. If a user searches the web for an XML document in a particular field, the user can even obtain the results of a query about the found XML documents based on partial knowledge that does not exactly reflect the schema of the particular field.
A retrieval expression, of a user, based on the semi-structured characteristic of XML data can be expressed in a regular path expression query, for example, in an XML query, XQuery. Since the XML regular path expression query does not exactly describe the structure of data corresponding to the conditions of the query in contrast with a database query based on a typical schema, a query processing system may interpret and execute queries representing the same condition differently, according to a method in which a user defines a query. Therefore, a method in which the query processing system interprets and optimizes an XML regular path expression query may greatly affect the performance of query processing.
Examples of a method of storing an XML document in a relational database include an edge method and an attribute method. The edge method has a merit in that an XML document can be stored and processed even if there is no schema data regarding the XML document. However, the edge method may cause a degradation in performance due to a self-joining operation duplicated as much as the length of a path expression for a relatively large edge table. A joining operation denotes an operation to calculate a relationship between the elements of a table and those of another table. A self-joining operation denotes an operation to calculate a relation between the elements existing within a table.
In the attribute method, entity-unit tables are produced and processed when schema data of an XML document has already been known. Thus, data can be divided and stored in many tables, providing higher performance than the edge method. However, in the attribute method, the number of tables may excessively increase depending on an XML schema, or data may be unnecessarily fragmented.
In order to overcome these problems, research on a method of determining the type and number of tables that can be divided by using a data mining method for storing an XML document with no schema, has been conducted. Since both the edge method and the attribute method are basically provided to process a path expression, the two methods require as many table-joining operations as the length of the path expression. Also, the two methods are not suitable to process a regular path expression.
The edge method includes a method of processing a regular path expression using information of the beginning and end offsets of a tag. The edge method for processing a regular path expression is ineffective if the edge method is applied to a long path expression. The query processing performance of the edge method for processing a regular path expression depends on a method in which a user defines a query.
An example of a method of indexing a path expression of XML data is an index fabric method, in which indices for various paths existing in an XML document are managed in a signal indexing structure. The indexing structure is a structure extended to support a relational database. In the index fabric method, to process a regular path expression of a particular path, the particular path must be additionally specified as a refined path in the indexing structure.
There has been proposed a table structure that serves as an index. However, when the index fabric method is adopted to index an XML path expression, the number of index tables increases, or an index table may be fragmentated. An XML data path expression processing is similar to a path expression processing performed in an object-oriented database in that the XML data path expression processing is performed in such a way to travel around a tree.