Japanese Patent Laid-open Official Gazette No. Hei. 9-319632 (hereinafter, referred to as “Patent Literature 1”) has disclosed an edition management method of displaying information concerning what each edition has been like, when a structured document such as an SGML (Standard Generalization Markup Language) is going to be edited for deletion, insertion, revision and the like, for each edition. According to the edition management method, in order to specifically show a part to be edited in each version while reducing an amount of data to be stored, all the contents of a structured document are stored with regard to a first edition, whereas only information concerning a difference between the current edition and the previous edition is stored with regard to a second edition and each edition following the second edition. In addition, according to a conventional mode of displaying information concerning a difference, contents prior to a revision in the current edition and contents posterior to the revision in the current edition have been displayed respectively, as tagged texts, and in sub-areas into which the display area is divided, as shown by FIG. 4(b) in Patent Literature 1, and visibility for the comparison has been poor. By contrast, in an edition management according to Patent Literature 1, deleted contents, inserted contents and revised contents are designed to be compared for each structured part as shown by FIG. 23 in Patent Literature 1, thereby improving the visibility for the comparison. In other words, according to the invention as disclosed in Patent Literature 1, information concerning the difference itself is stored by a structured document, as shown by FIGS. 6 and 18 in Patent Literature 1.
According to Japanese Patent Laid-open Official Gazette No. 2004-62716 (hereinafter, referred to as “Patent Literature 2”), in order to perform a syntactic analysis of a structured document at a higher speed, information concerning an event set as a result of the syntactic analysis of the structured document is stored in advance, with regard to a single structured document whose structure a single application program repeatedly requests to be analyzed, or with regard to a single structured document whose structure a plurality of different application programs commonly request to be analyzed. When the syntactic analysis of the structured document is once again requested by an application program subsequently, the information concerning the event set which has been stored is read out instead of a syntactic analysis being once again performed on the structured document. Thereby, a series of events are reproduced from the information concerning the event set, thus posting the series of events to the application program.
On the other hand, several methods have been proposed, with which a highly approximate document is detected out of normal text documents at high speed, as shown, for example, in “A system for Approximate Tree Matching,” (online), available from US CiteSeer.IST (Scientific Literature Digital Library)<http://citeseer.ist.psu.edu/tsong-li92system.html>, (accessed 2004-9-1) (hereinafter, referred to as “Non-patent Literature 1”), and in “On the Editing Distance between Undirected Acyclic Graphs and Related Problems,” (online), available from US CiteSeer.IST (Scientific Literature Digital Library)<http://citeseer.ist.psu.edu/zhang-li95editing.html>, (accessed 2004-9-1) (hereinafter, referred to as “Non-patent Literature 2”).
In addition, a method using an automaton in an adaptive manner is an area which has been researched as a learning automaton, as shown, for example, in Tsetlin, M. L., “Automaton Theory and the Modeling of Biological Systems,” New York and London, Academic Press, 1973 (hereinafter, referred to as “Non-patent Literature 3”).
Furthermore, there is an SIA (System Integrated Automaton for SAX) Parser as described in “System Integrated Automaton for SAX,” (online), available from <http://www.geocities.com/siaparser/resources/siaidea.html>, (accessed 2004-9-1) (hereinafter, referred to as “Non-patent Literature 4”).
A simple and apparent method of fetching a difference from a highly approximate XML document with regard to XML documents which have been analyzed may simply fetch a difference with respect to a byte string or a character string. With regard to this difference analysis method, there have been various proposals for a long time, as disclosed in Heckel, P., “A technique for Isolating Differences between Files,” Communication of the ACM, April, 1978 (hereinafter referred to as “Non-patent Literature 5”).
With regard to an edition management device according to Patent Literature 1, disclosed is use of information concerning a difference for the purpose of saving an amount of information to be stored in an edition management. However, there is no reference made to a specific technique for carrying out a syntactic analysis of a structured document at a higher speed.
A structured-document processing device according to Patent Literature 2 can be adapted for carrying out a syntactic analysis, at a higher speed, for a structured document on which a syntactic analysis has been performed when an application program once again requests the structure of the structured document to be analyzed. However, the structure-document processing device cannot cope with a request for a syntactic analysis to be made for a structured document which is different from the structured document on which the syntactic analysis has been performed.
All of the conventional techniques of retrieving an approximate XML document as disclosed in Non-patent Literatures 1 and 2 make a decision on approximateness of a document which has been parsed. The techniques cannot be used for the purpose of carrying out a parsing process efficiently.
A simple adaptation of an automaton for a document (Non-patent Literature 3) would not take a structure of an XML or a form of the XML into consideration, thus requiring a time-consuming operation such as a check on whether or not the document is well-formed. Accordingly, there is a significant problem with the simple adaptation in terms of efficiency.
The SIA parser as described in Non-patent Literature 4 is designed to recognize a grammar of an XML tree structure itself, and to process it by using an automaton for an SAX event. For this reason, this SIA parser cannot be adapted for a text on which parsing (syntactic analysis) has not been performed as it is.
Non-patent Literature 5 has not made any suggestion concerning carrying out parsing of an XML document, which has not been parsed, at a higher speed.