1. Field of the Invention
The present invention generally relates to information processing technology, and more particularly to a method, program, and system for dividing the tree structure of a structured document.
2. Related Art
Companies in the industry have developed techniques for processing structured documents. As the background art, for example, Japanese Patent Application Publication No. 2007-0279964 (Patent Ref. 1; Ref. 1) discloses an information search device including: block division means for dividing, based on a markup language, information including blocks as candidates for a search target into a required size of blocks; a storage medium on which multiple keywords unique in the field of the search target and values indicative of the frequency of occurrence of the target keywords are stored in pairs; scoring means for scoring each of the blocks based on a keyword used in each block and the information stored on the storage medium; and search target specifying means for specifying, as the search target, a block whose score falls within a predetermined range.
As another example of the background art, International Publication WO2005/6192 (Patent Ref. 2; Ref. 2) discloses a structured document processing method including: a structured document holding step for holding a structured document including tags intact as text in memory means; a document structure holding step for holding, in the memory means, document structure information on the structured document in association with the position of each tag in the structured document; and processing step for tracking a tree structure of the structured document according to the document structure information in response to a processing request to acquire information on elements in order to acquire part of the structured document based on the information acquired. The structured document holding step is to divide the structured document into a predetermined size of multiple divided portions to hold the structured document in the divided portions in order to absorb, at the boundary of any divided portions, changes in the size of the divided portions caused by updating the divided portions.
As still another example of the background art, Japanese Patent Application Publication No. 2007-0193660 (Patent Ref. 3; Ref. 3) discloses an information management device for managing information on a hierarchical tree structure of a storage device. This device includes: information storage means capable of storing a tree structure file describing, in a predetermined markup language, the information on the tree structure of the storage device; division determination means for determining whether to perform processing for dividing the tree structure file based on at least one of the state of the tree structure of the storage device and the processing power of the information management device; and control means for controlling the information storage means to keep the tree structure file stored in the information storage means when the division determination means determines that the processing for dividing the tree structure file is not performed, or when the division determination means determines that the processing for dividing the tree structure file is performed, to perform the processing for dividing the tree structure file to achieve a predetermined divided state including part of the tree structure so that the portions of the tree structure file divided by performing the division processing will be stored in the information storage means, respectively.
As yet another example of the background art, Japanese Patent Application Publication No. 2002-0108844 (Patent Ref. 4; Ref. 4) discloses an XML data division editing apparatus for editing and retrieving XML data, including means for analyzing input source XML data based on tags of the input source XML data and tag values to generate a tag list, and means for dividing the XML data using a main key target tag and division target tags selected from the tag list. When the XML data is divided in the XML data division editing apparatus, a main key index indicative of association between the value of the main key target tag and the divided XML data, and a division target tag tree structure in which values of the division target tags are hierarchized as tag values are crated.
Wei Lu et al., “Parallel XML Processing by Work Stealing,” High Performance Distributed Computing, Proceedings of the 2007 workshop on Service-oriented computing performance: aspects, issues, and approaches, 2007, pp. 31-38, ISBN: 978-1-59593-717-9 (Non-patent Ref. 1; Ref. 5) discloses the background art as follows: “We present a parallel processing model for the XML document. The kernel of the model is a stealing-based dynamic load-balancing mechanism, by which multiple threads are able to process the disjointed parts of the XML document in parallel with balanced load distribution. The model also provides a novel mechanism to trace the stealing actions, thus the equivalent sequential result can be gotten by gluing the multiple parallel-running results together.”
Wei Lu et al., “A Parallel Approach to XML Parsing,” International Conference on Grid Computing, Proceedings of the 7th IEEE/ACM International Conference on Grid Computing, 2006, pp. 223-230, ISBN: 1-4244-0343-X (Non-patent Ref. 2; Ref. 6) discloses the background art as follows:
“This paper presents our design and implementation of parallel XML parsing. Our design consists of an initial preparsing phase to determine the structure of the XML document, followed by a full, parallel parse. The results of the preparsing phase are used to help partition the XML document for data parallel processing. Our parallel parsing phase is a modification of the libxml2 [1] XML parser, which shows that our approach applies to real-world, production quality parsers.”
Yunfei Pan et al., “A Static Load-Balancing Scheme for Parallel XML Parsing on Multicore CPUs,” CCGRID, Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid, 2007, pp. 351-362, ISBN: 0-7695-2833-3 (Non-patent Ref. 3; Ref. 7) discloses the background art as follows:
“We introduce a new static partitioning and load-balancing mechanism. By using a static, global approach, we reduce synchronization and load-balancing overhead, thus improving performance over dynamic schemes for a large class of XML documents. Our approach leverages libxml2 without modification, which reduces development effort and shows that our approach is applicable to real-world, production parsers.”
Additional references disclosing background art are as follows:
Tim Bray et al., “Extensible Markup Language (XML) 1.0 (Fifth Edition),” The World Wide Web Consortium (W3C), W3C Recommendation 26 Nov. 2008, URL:http://www.w3.org/TR/xml/ (Retrieved Jun. 9, 2010) (Non-patent Ref. 4; Ref. 8).
Henry S. Thompson et al., “W3C XML Schema Definition Language (XSD) 1.1 Part 1: Structures,” The World Wide Web Consortium (W3C), W3C Working Draft 3 Dec. 2009, URL: http://www.w3.org/TR/xmlschema11-1/(Retrieved Jun. 9, 2010) (Non-patent Ref. 5; Ref. 9).
David Peterson et al., “W3C XML Schema Definition Language (XSD) 1.1 Part 2: Datatypes,” The World Wide Web Consortium, W3C Working Draft 3 Dec. 2009, URL:http://www.w3.org/TR/xmlschema11-2/(Retrieved Jun. 9, 2010) (Non-patent Ref. 6; Ref. 10).
The problem of reducing the processing efficiency of the structured document cannot be resolved by any of the aforementioned examples of the background art.