1) Field of the Invention
The present invention relates to devices for interpreting and retrieving XML documents, methods of retrieving and interpreting the XML documents, and a computer product.
2) Description of the Related Art
An XML is a mark up language that can pad information by tags. Because document-structure and format information are strictly separated in the documents written in the XML, the XML is replacing the HTML.
Normally an XML parser is used to interpret the XML documents, which have a simple documents-structure. The XML parser is a module that includes functions of reading the XML documents and interpreting what kind of a documents-structure the XML document has.
An application programming interface (API) is employed in the XML parser. In this API, a document object model (DOM), and a simple API for XML (SAX) are standardized. XMLs with the APIs mounted are predominant.
In the XML parser of DOM (DOM parser), all XML documents are read and elements, attributes, and character data that are included in the XML documents are mounted to be interpreted as a tree structure. In the XML parser of SAX (SAX parser), unlike in the DOM parser, not all the XML documents are read but the elements, attributes, and character data that are included in the XML documents are mounted to be interpreted from the header of the XML documents, i.e. in an order starting from appearance of the elements.
An XML documents retrieving device that allows a user to retrieve efficiently an XML database that has a similar DTD semantically, without taking into consideration a difference of DTD, has been proposed in Japanese Patent Application Laid-open Publication No. 2000-250938
However, when the XML documents are to be extracted or retrieved from a database, the corresponding XML documents can be extracted or retrieved only after interpreting all the XML documents. In this case, for interpretation by the general DOM parser, all the XML documents are to be read. However, it takes a lot of time to interpret all the XML documents in the database.
In the general SAX parser, although it is not necessary to read all the XML documents, all information padded in the tags and information present between the tags is required to be read. However, it takes a lot of time to interpret all the XML documents in the database.
In the XML documents retrieving device disclosed in Japanese Patent Application Laid-open Publication No. 2000-250938, the following steps are performed according to a retrieval style (formula) for input created by a database client:    1) Extraction of an element name in an input analyzer,    2) Acquisition of a synonym for the element name from a synonym extractor,    3) Comparison of the synonym with an element name stored in a category analogic section, and    4) Selection of an element name that matches.Therefore, since all element names are subjected to comparison, the retrieval takes time.