1. Field of the Invention
The present invention relates to a method and apparatus for managing data written in a markup language, and a computer-readable recording medium for recording a program designed to perform the same method, and more particularly, to a method for generating, storing, deleting, and updating a fragment obtained by splitting data written in a markup language and generating, storing, deleting, and updating an index associated with the fragment, and a computer-readable recording medium for recording a program designed to perform the same method.
2. Description of the Related Art
Digital devices containing digital circuits for processing digital data are growing in popularity. Examples of digital devices include computers, printers, scanners, pagers, digital cameras, facsimiles, digital copiers, personal digital assistants (PDAs), cellular phones, digital home appliances, digital phones, digital projectors, home servers, digital video recorders, digital TV broadcast receivers, digital satellite broadcast receivers, and set-top boxes.
Meanwhile, digital data processed by digital devices can be represented in various programming languages. In particular, markup languages including Standard Generalized Markup Language (SGML), HyperText Markup Language (HTML), and Extensible Markup Language (XML) are gaining popularity due to their increased ability to convey structural information. Detailed information on markup languages can be found at http://www.w3.org or http://www.xml.com. The most commonly used markup language is XML, which is an official World Wide Web Consortium (W3C) standard, and various other markup languages are expected to be developed in the future.
As shown in FIG. 1, digital data formatted with a markup language is divided into structure and content. For example, as shown in FIG. 2, the digital data may have a hierarchical structure consisting of a root node “TVAMain” 10 and children nodes “ProgramDescripton,” “ProgramLocatonTable,” “BroadcastEvent” 11, “EventDescription,” “ServiceId” 12, “PublishedTime” 13, and “PublishedDuration” 14. FIG. 2 shows part of a hierarchical structure of digital data used as metadata for a broadcast program in the TV-Anytime Forum, a private organization founded in September 1999 to develop specifications to enable audio-visual and other services in a user environment such as a personal digital recorder (PDR) having high capacity storage for personal use.
Digital devices can gain quicker access to desired content using information on the hierarchical structure when processing digital data written in a markup language and formatted according to a predetermined protocol.
Thus, digital data written in a markup language and having a format predefined according to a predetermined protocol can be effectively processed in various digital devices such as computers, PDAs, and cellular phones conforming to this protocol.
Due to these advantages, digital data written in a markup language are commonly used in enterprise-class systems with excellent computing capability as well as small digital devices with restricted computing capability.
Digital data written in a markup language has a wide range of uses ranging from personal information such as a telephone number list to metadata for describing and managing multimedia data. There are a variety of different types of digital data having different content and structures for each different kind of use.
Various approaches have been proposed for managing digital data written in a markup language. For example, one representative method for storing and retrieving XML data is a node numbering scheme as shown in FIG. 3. The node numbering scheme has been presented in [1] Chun Zhang, Jeffrey F. Naughton, Qiong Luo, David J. DeWitt, and Guy M. Lohman “On Supporting Containment Queries in Relational Database Management Systems,” In Proc. of the 2001 ACM-SIGMOD conference, Santa Barbara, Calif., USA, May 2001, [2] Quanzhong Li and Bongki Moon “Indexing and Querying XML Data for Regular Path Expressions,” In Proc. of the 26th VLDB conference, Rome, Italy, September 2001, and [3] Torsten Grust “Accelerating XPath Location Steps,” In Proceedings of the 2002 ACM-SIGMOD conference, pages 109-120, Madison, Wis., June 2002.
The node numbering scheme allows each node (element or attribute) in an XML document to be naturally mapped to a tuple in a relational table. Each tuple is expressed in a structure <doc_id, begin_pos, end_pos, level>.
Here, doc_id is an ID of a document, begin_pos and end_pos denote information on the position of a node within the document, and level denotes the depth of the node from a root node. For example, a tuple corresponding to each node present within a ProgramInformation node in the XML document of FIG. 1 may be represented by each node of a tree shown in FIG. 3 using a node numbering scheme.
In FIG. 3, since all nodes belong to the same document, doc_id is set to 1, and a terminal node is used as a special node type designating a text value of a parent node.
In the node numbering scheme, ancestor-descendant and parent-child relationships between two nodes are expressed using the operators ‘//’ and ‘/’ in the XPath language and the XQuery language, respectively, and the relationships are determined by comparing a field value of each tuple. For example, if node ‘A’ is an ancestor of node ‘b’ (i.e. A//B), the nodes ‘A’ and ‘B’ satisfy requirements: ‘A.doc_id==B.doc_id, A.begin_pos<B.begin_pos, and A.end_pos>B.end_pos’. If node A′ is a parent of node B′ (i.e., A/B), ‘A.level==B.level-1’ is added to the above requirements.
A structural join can be performed using the above-mentioned requirements to find a pair of nodes that satisfy a query expressed as ‘A/B’ or ‘A//B’ in an XPath or XQuery language.
The node numbering scheme enables the storage and retrieval of general XML data without the need for information on a document type definition (DTD) or XML schema associated with input data. However, this scheme suffers from several drawbacks in small digital devices having restricted computing capability. The number of joins that must be performed to process a path expression to retrieve XML data is equal to the length of the path expression.
Since an insufficient memory is available for computation in small digital devices, Input/Output operation is frequently generated, which may result in significant performance degradation. In restoring the XML data, the same result occurs. Updating a child node may result in updating a parent node, thus making the update process ineffective.
Another method for storing and retrieving documents is an attribute inlining technique. This technique not only prevents excessive fragmentation caused by node numbering by inlining one or more XML nodes into a single table but also allows a relational database to be automatically implemented using a given DTD or XML schema.
For more information on attribute inlining, see [Javavel Shanmugasundaram, Kristin Tufte, Chun Zhang, Gang He, David J. Dewitt, and Jeffrey F. Naughton “Relational Databases for Querying XML Documents: Limitations and Opportunities,” In Proc. of the 25th VLDB conference, pages 302-314, Edinburgh, Scotland, September 1999].
However, like the node numbering scheme, attribute inlining requires a large amount of computation when each of many nodes within a DTD has a plurality of cardinalities. Furthermore, since the method is very sensitive to schema (DTD) for XML data, changes in node attributes cause recreation of a database.
Therefore, there is a need for a method for effectively managing digital data written in a markup language.