1. Field of the Invention
The present disclosure relates generally to a method, an apparatus, and a computer-readable storage medium for labeling and querying XML documents to determine the relationships between nodes.
2. Description of the Related Art
Labeling is a process by which each node in a XML tree is given a label which holds information about that node, such as level, order or unique identifier in a way its position and also its relationship with other nodes is recognizable. Each node can be a parent, ancestor, child, descendant, or sibling of another node in the XML tree.
There are several available labeling schemes for XML trees. Examples of available labeling schemes are Range-based schemes, Prefix-based schemes, and Prime-based schemes. Range based labeling scheme identify each node with a label that consist of start number, end number and level according to the pre-order traversal of the XML tree. Prefix based labeling schemes store information of ancestors labels in the labels of their descendants using a delimiter, such as a “.”. There are hybrid labeling schemes which combine the advantages of Range-based and Prefix-based labeling schemes. See S. C. Haw and C. S. Lee, “Extending path summary and region eEncoding for efficient structural query processing in native XML databases,” Journal of Systems and Software (2009), hereby incorporated by reference in its entirety.
A Range labeling scheme, of the form (23, 44, 3), gives a node a label of the form (StartNo, EndNo, Level). This labeling scheme can determine the Parent-Child and Ancestor-Descendant relationships between two nodes using arithmetic range comparison operations. Conversely, sibling relationship cannot be identified from labels themselves. This labeling scheme is not applicable for dynamic XML documents since all nodes must be relabeled in case of insertion of a new node or a new subtree occurs.
J. H. Yun and C. W. Chung, “Dynamic interval-based labeling scheme for efficient XML query and update processing,” Journal of Systems and Software (2008), hereby incorporated by reference in its entirety, proposed a range-based labeling scheme with a nested tree structure which eliminates the limitations and takes advantage of the previous interval-based node labeling schemes. Their approach supports XML data updates with almost no node relabeling. Also, the integer comparison operation is changed to the integer list comparison operation.
Other examples of Range-based labeling schemes are by P. F. Diets, “Maintaining order in a linked lists,” ACM Symposium on Theory of Computing (1982), hereby incorporated by reference in its entirety, Q. Li and B. Moon, “Indexing and querying XML data for regular path expressions,” VLDN (2001), hereby incorporated by reference in its entirety, and R. Thonangi, “A concise labeling scheme for XML data,” COMAD 2006, Delhi, India (2006), hereby incorporated by reference in its entirety.
In a Prefix-based labeling scheme, of the form (1.3.22.4), a given node X is a descendant of a node Y if the label of Y is the prefix of the label of X. All the structural information of node relationships can be captured by looking only at the labels. This structural information requires large storage space for the labels. Alternatively, it efficiently identifies the ancestor-descendant, parent-child, and sibling relationships between tree nodes via string matching operations.
Dewey ID by I. Tatarinov et al., “Storing and querying ordered XML using a relational database system,” ACM SIGMOD (2002), hereby incorporated by reference in its entirety, and Extended Dewey by J. Lu et al., “From region encoding to extended dewey: on efficient processing of XML twig pattern natching,” VLDB 2005 (2005), hereby incorporated by reference in its entirety, are examples of prefix-based labeling schemes that are not capable of dynamic XML documents since both methods require relabeling of nodes if a new node is inserted.
Prefix based labeling schemes started with using only integers to represent labels, but afterwards, a combination of integers and alphabets have been used to represent node labels. In order to provide dynamic Dewey, new approaches were proposed. One proposal called “sibling labeling scheme” is by H. A. Al-Jamimi, A. Barradah, and M. Salahadin, “Siblings labeling Scheme for updating XML trees dynamically,” International Conference on Computer Engineering and Technology (2010), hereby incorporated by reference in its entirety. Another proposal called “DDE” is by Liang Xu, Tok Wang Ling, Huayu Wu, Zhifeng Bao, “DDE: from Dewey to a fully dynamic XML labeling scheme,” SIGMOD Conference (2009), hereby incorporated by reference in its entirety. The “sibling labeling scheme” approach requires relabeling of at most two nodes when a new node is inserted; whereas, DDE avoids relabeling completely.
Patrick O'Neil et al., “ORDPATHs: insert friendly XML node labels,” ACM SIGMOD (2004), hereby incorporated by reference in its entirety, introduced OrdPath, which is a dynamic labeling scheme different from Dewey but of the same order. Node labels are assigned by the Dewey order except that it does not use even and negative integers in the initial labeling, of the form (1.5.7.9). It reserves even and negative integers for later insertions into an existing tree. Also it stores the label of each node as an encoded binary representation. The problem with OrdPath occurs when the size of the codes overflow, which means OrdPath must re-label all the existing nodes. For more about the overflow problem, see C. Li and T. W. Ling, “QED: A Novel quaternary encoding to completely avoid re-labeling in XML updates,” CIKM (2005), hereby incorporated by reference in its entirety. The overflow problem effects other labeling schemes such as LSDX by M. Duong, and Y. Zhang, “LSDX: new labeling scheme for dynamically updating XML data,” the 16th Australian Database Conference, hereby incorporated by reference in its entirety, and SCOOTER by M. F. O'Connor and M. Roantree, “SCOOTER: a compact and scalable dynamic labeling scheme for XML updates,” Springer-Verlag Berlin Heidelberg (2012), hereby incorporated by reference in its entirety. Thus, these labeling schemes are not preferred when XML documents have deep trees.
H. Ko and S. Lee, “A Binary String Approach for Updates in Dynamic Ordered XML Data,” IEEE Transactions on Knowledge and Data Engineering (2010), hereby incorporated by reference in its entirety, proposed IBSL “Improved Binary String Labeling” as a labeling scheme. Their labeling scheme uses Dewey order but uses bit-strings of the form (101.1.100.111), with full support for update without recalculation or relabeling. Alternatively, this scheme does not use the characteristics of binary numbers to do bits-matching, however, it uses string matching in order to identify the relationships between nodes.
B. G. Assefa and B. Ergenc, “Orderbased labeling scheme for dynamic XML query processing,” CD-ARES 2012, LNCS 7465, pp. 287-301, 2012, International Federation for Information Processing (2012), hereby incorporated by reference in its entirety, proposed a dynamic OrderBased labeling scheme which optimizes the label size of every level. Their scheme proved efficient querying time when compared to Com-D by M. Duong and Y. Zhang, “Dynamic labeling scheme for XML data processing,” Meers-man, R., Tani, X. (eds.) OTM 2008, hereby incorporated by reference in its entirety. It also has an efficient label size with efficient storage requirement when compared to LSDX.
Many recent prefix-based labeling schemes which are based on Dewey structure, use compression and decompression techniques in order to minimize the label size and space requirement, but as a result query processing time suffers. Alternatively, some schemes try to play with the Dewey structure and shrink it, but consequently, in order to process queries, they consume much time to do it recursively.