1. Field of the Invention
The present invention relates to a process for searching a structured document written with a set of hierarchical elements, and more particularly, to an apparatus converting the structure of a document in order to search for an element of the structured document.
2. Description of the Related Art
A description form of a structured document is typified by SGML (Standard General Markup Language) intended for a large-scale database, HTML (Hyper Text Markup Language) having a simple configuration intended for the WWW (World Wide Web), XML (extensible Markup Language) obtained by simplifying SGML for the Internet, etc. HTML has been popularized as the contents form of the WWW on a worldwide scale. Especially, XML has been attracting attention as a complement for HTML in recent years. XML not only describes a document on the Internet, but also has been becoming a medium via which all types of information appliances such as a cellular phone, a car navigation system, etc. make communications.
The summary of an XML document written in XML is introduced, for example, by “Extensible Markup Language (XML) 1.0 (Second Edition)”. An XML document is composed of three major portions such as an XML declaration 11, a DTD (Document Type Definition) 12, and an XML implementation value (instance) 13 as shown in FIG. 1A. The portion of the XML implementation value is written with a set of hierarchical elements, and tags are used as marks for identifying the elements.
FIG. 1B shows how to write a tag indicating one element. In FIG. 1B, a portion of “element content”, which is written between a start tag 21 and an end tag 22 including an element name, indicates the content of an element. An empty element tag 23 indicates the tag of an element the content of which is empty. Additionally, how to write a tag indicating a hierarchical structure where a statement and a low-order element coexist as the content of an element is shown in FIG. 1C. In this figure, an element b is inserted between contents 1 and 2 of an element a, and the element b exists below the element a. In this case, the elements a and b have a parent-child relationship.
Furthermore, if attributes are given to an element, attribute names and attribute values are written in the start tag of the element as follows.
<element name attribute name 1=“attribute value 1” attribute name 2=“attribute value 2” . . . >
An XML document falls into two types such as a well-formed type and a valid type from a processing viewpoint. The relationship between the configuration of a structured document including such two XML document types and its processing category is shown in FIG. 1D. This figure shows whether or not ((∘) or (⊃)) a declaration, a document type definition, and an implementation value are indispensable for each of a well-formed XML document, a valid XML document, an SGML document, and an HTML document. By way of example, for the well-formed XML document, only an implementation value is indispensable, and a declaration and a document type definition are not always required.
Software acting as a intermediary that parses an XML document and passes the parsed document to another application software such as a browser, etc. is called an XML processor (parser). The summary of the XML processor is introduced by “Document Object Model (DOM) Level 2 Core Specification Version 1.0 W3C Recommendation Nov. 13, 2000” and “SAX2.0: The Simple API for XML”.
FIG. 1E exemplifies the process performed by an XML processor. In this figure, an XML processor 32 checks a given XML document 31, and passes an XML document 33 represented by a tree structure to application software 34. If a document type definition is included in the XML document 31 at this time, only the tagging form of an XML implementation value is checked.
For such an XML processor, there are two types of API (Application Programming Interface) for manipulating an XML document with Java (TM) language: SAX (Simple API for XML), and DOM (Document Object Model). SAX is an event-driven API that notifies application software of an event such as the start or the end of a document or an element, a character string appearance etc., while reading an XML document.
In the meantime, DOM is a general-purpose XML operation API. DOM expands an XML document in a memory as a DOM tree structure. Application software then performs an operation for the DOM object, so that it can access the XML document. Additionally, the original XML document can be restored from the DOM object.
For example, a DOM tree structure shown in FIG. 1G is generated from the XML document shown in FIG. 1F. In FIG. 1G, each of arrows indicates a method (function) for calling each node, and a Document 41 corresponds to an interface representing the whole of the XML document.
Additionally, a NodeList 42 is used to manage lower elements and character data, which belong to a certain node, in an appearance order within the XML document, and has instances such as an Element 43, a Text 44, etc. as lower nodes. A NamedNodeMap 45 is a collection for accommodating nodes that have no meaning in their arrangement order, but whose values must be referenced by using their names as keys. An attribute (Attr 46), etc. are written in the NamedNodeMap 45.
As a typical application example of an XML document, a tag search of an XML document as a database exists. With this process, a portion corresponding to a given search key is searched in an XML document, and a search result is output.
FIG. 1H is a flowchart showing a tag search process using DOM. A process program first inputs an XML document (step S1), and inputs a search key (step S2). Next, the process program generates an instance of an XML processor (step S3), and executes the instance (step S4). As a result, the tag structure of the XML document is parsed, and the DOM tree structure is configured.
Next, the process program traces the tree structure from the root, detects the portion corresponding to the search key, and deletes an unnecessary portion of the tree structure (step S5). In this way, the number of nodes of the tree structure is reduced, and a subtree is generated. The obtained subtree is output as a search result (step S6), and the process is terminated.
If a large-scale database is built with XML, the tag search shown in FIG. 1H is an effective method in a point that the search can be made at considerably high speed. For example, a process such that a hometown is input as a search key, a DOM tree structure is searched, and a subtree of corresponding personal data is left and output in a residents' card database, is enabled.
FIG. 1I is a flowchart showing a tag search process using SAX. The process program inputs an XML document (step S11), and inputs a search key (step S12). Then, the process program generates an instance of a handler (step S13), and generates an instance of an XML processor (step S14), so that the XML processor is executed (step S15).
The XML processor parses the tag structure of the XML document, executes the handler at each detection of a tag, and detects the portion corresponding to the search key (step S16). The XML processor then outputs the obtained search result (step S17). Here, the process is terminated.
However, the above described conventional tag search using DOM has the following problems.
If the scale of a DOM tree structure increases, the same character string as a search key is detected from the content of each element. Therefore, a lot of processing time is required. Additionally, with DOM, a long fixed-length memory region is secured to write data by expecting that a long character string appears in each field. The larger the tree structure, the more the working memory space is required.