1. Field of the Invention
The present invention relates to a processing system for handling a structured document such as standard generated markup language (SGML), extensible markup language (XML), hyper text markup language (HTML) and more particularly to a structured document processing system which treats the structured document as a tree structure upon processing.
2. Description of the Related Arts
In recent years, multiple systems, enterprises and individuals have been connected through Internet so as to exchange data widely for electronic data interchange (EDI), electronic commerce (EC), mobile phone service, digital TV service, web service and the like. Corresponding to such a situation, there has been a trend of unifying the format of data to be handled by computer. This enables diversified types of data depending on computer or application to be adopted by a different computer or application. The standard for this unification has been recommended officially by world wide web consortium (W3C) February 1998. The XML standard is a sub-set of the same standard SGML. The standard DOM (document object model) for an interface handling its object was recommended also by the W3C October 1998. Hereinafter, according to the XML standard, a character string enclosed by “<” and “>” is called tag, “<character string>”, start tag, “</character string>” end tag, a character string sandwiched by the start tag and end tag, element, name of element described in a tag, element name and additional information for element, attribute.
The XML document describes data structure such that the tag is buried in a document itself. The data structure in which the tag is buried in a document ensures high flexibility and expandability in the data structure. Describing the tag with a text which is meaningful when seen by the human being enables the data handled by an independent system up to now to be handled easily by other system. The DOM processor has been widely used as the XML processor which acquires an element name, element content, attribute, character string and the like and transfers to user application, changes its content, adds and deletes.
FIG. 1 is an explanatory diagram of a conventional example. The conventional structured document processing system includes a structured document storage unit 1, an expanding unit 2, an object holding unit 3, a processing unit 4 and user application 6. The structured document storage unit 1 stores the structured document such as XML document in a disc. The expanding unit 2 analyzes the structure of the structured document such as the XML document and expand to the object holding unit 3 as an object. This is called DOM tree expansion.
The object holding unit 3 is a memory which holds an object expanded by the expanding unit 2. The processing unit 4 is a program which offers the API group specified to W3C (DOM processor), which processes an object of the object holding unit 3 following an instruction of the user application 6. The user application 6 handles expanded structured document.
For processing by the processing unit (DOM processor) 4, the structure of the XML document, which is the structured document, is analyzed by the expanding unit (DOM tree expanding unit) 2 and an object (DOM tree) is expanded to the object holding unit (memory) 3. The XML document is a series text and the DOM tree expanded as an object is separated to respective elements and those elements are stored according to data structure described with the tag. Hereinafter, an object expanded on the memory by analyzing the structure of the structured document such as the XML document is called “object”. Because the structural analysis becomes unnecessary by using the expanding unit 2 and the processing unit 4, the user application 6 can be made easily.
FIG. 2 is a conventional processing flow chart, which will be described through processings S11-S12.
S11: If various kinds of instructions about processing is dispatched from the user application 6, the expanding unit 2 expands the entire structured document to the object holding unit (memory) 3 and the processing proceeds to processing S12.
S12: The processing unit 4 carries out various kinds of processings following an instruction from the user application 6 and terminates this processing.
However, because as shown in FIG. 2, the DOM processor, which is the conventional expanding unit 2, expands the entire XML document on the memory, load on the central processing unit (CPU) of the expanding unit 2 is so high that a necessary amount of the main memory is large, which is a problem to be solved. The parse document object model (PDOM) is available for solving this problem.
FIG. 3 is an explanatory diagram of a conventional parse document object model (PDOM). The conventional structured document processing system includes a disc 1, an expanding unit 2, an object holding unit 3, a processor unit 4, and user application 6. After the structured document 10 is expanded as an object, this system converts the object to a tree structure document 10B composed of series binary data and stores in the structured document storage unit 1. A partial tree 11, which is part of this tree structure document 10B, is cached on the memory which is the object holding unit 3 so as to process the object (DOM tree).
FIG. 12 is a flow chart of conventional PDO processing, which will be described about processings S21-S23.
S21: If an instruction about various kinds of processings is received from the user application 6, the expanding unit 2 expands the entire structured document 10 in the structured document storage unit 1 to the tree structure document 10B, stores in the structured document storage unit 1 and the processing proceeds to processing S22.
S22: The processing unit 4 reads out partial tree 11 which is part of the tree structure document 10B for use for a processing instructed by the user application 6 from the structured document storage unit 1 onto a memory which is the object holding unit 3 and the processing proceeds to processing S23.
S23: The processor unit 4 carries out various kinds of processing according to an instruction from the user application 6 and terminates this processing.
However, the above-described conventional system has following problems. If the structure document such as the XML document is large, processing expanded on the DOM tree occupies most of the processing. The DOM tree requires a capacity five to ten times the structured document such as its original XML document, so that the structured document such as the XML document larger than several tens MB cannot be handled by the conventional DOM processor. Although currently available PDOM has solved this problem, its CPU load for initial DOM tree expansion is high also. Further, because a binary which is controlled separately from the structured document such as the XML document is generated, integrated control of data is impossible.