XML (extensible markup language) is receiving attention as means capable of general-purpose data exchange. The XML can provide document type definitions (DTD) XML schema or RELAX, etc., where an XML document matching with the document type definition is referred to as a valid XML. The valid XML is different from a well-formed XML which is merely grammatically well-formed, and its validity is assured including: a sequence of elements, whether or not the element is omissible; repetitions of the element, whether or not a hierarchical relationship of the elements is correct; and so on. It is possible, by handling such an XML document of which validity is assured, to render design, development and so on of applications easier so as to implement the data exchange of higher reliability.
Validation of the XML document is implemented by XML document analysis software (validator) called an XML parser or an XML processor. An overview of processing by a general validator is as follows. First, the document type definition is inputted to the validator. A table in compliance with the content model of the document type definition is generated for each element. Each of these tables is constituted to show a character string automaton for representing each content model of the document type definition, that is, a sibling relationship of the elements. Furthermore, a tree automaton for showing a rule of parentage of the elements is prepared. Next, the XML document to be validated is inputted to the validator. The above described table for each model and tree automaton are used so that the tree automaton reads execution results of each character string automaton and operates to validate the XML document.
The document type definition is illustrated by a case of the DTD. If a tag name appearing in the DTD is σ, a character string automaton Mσ corresponding to σ is represented by a tuple of five components represented by Mσ=<Σσ, Qσ, δσ, Iσ, Fσ>. Here, Σσ is a set (alphabet) comprised of all the element types σ′ which can emerge as children of the element type σ. Qσ is a state set based on a state q. δσ is a table representing a transitional relationship determined by σ′ on which Σσ is based and the state q, and it prescribes a transition from a combination of (σ′, q1) to a state q2. Iσ is an initial state set, and Fσ is a set of final states. It will be concretely described by taking the following sample DTD as an example.                (Sample DTD)        <!ELEMENT title (#PCDATA)>        <!ELEMENT para (#PCDATA)>        <!ELEMENT doc (title, section+)>        <!ELEMENT section (title, (section+|para))>        
The character string automaton Mdoc has the content model of an element doc represented by a tuple of five components of <Σdoc, Qdoc, δdoc, Idoc, Fdoc>. However, an alphabet Σdoc={title, section}, a state set Qdoc={0, 1, 2}, an initial state set Idoc={0}, and a set of final states Fdoc={2}. In addition, a transitional relationship table δdoc is shown as table 1.
TABLE 1Titlesection0I1—1—22F—2
To be more specific, validity analysis on a row of child elements of the doc element is started from a state 0, and it is verified whether it finally reaches a state 2 while referring to a corresponding tag name in Table 1 and tracking the state thereof. It is valid if it reaches the final state.
Likewise, the character string automaton Msection has the content model of the element doc represented by a tuple of five components of <Σsection, Qsection, δsection, Isection, Fsection>. However, an alphabet Σsection={title, section, para}, a state set Qsection={0, 1, 2, 3}, an initial state set Isection={0}, a set of final states Fsection={2, 3}. In addition, a transitional relationship table δsection is shown as Table 2.
TABLE 2Titlesectionpara0I1——1—232F—2—3F———
As for Mtitle, and Mpara, Q=I=F{0} and Σ={ } (empty set). A transitional relationship table δ is empty as shown by Table 3.
TABLE 30I
Thus, the automaton Mσ is prepared for each element (tag name) σ. And a pushdown memory (stack) and an element reader are prepared so as to validate the XML document according to the following procedure 1. FIG. 13 is a flowchart showing procedure 1.
(Procedure 1)
In a starting step of the procedure, the stack is an empty stack [ ]. First, the tag σ is read (step 201). It is determined whether the read tag σ is an open tag or a closed tag (step 202), and the head of the stack so is read if it is the open tag (step 203). It is determined whether so is empty (step 204), and (Mo, Io) is stacked on the stack if it is empty (step 205), and it returns to the step 201. If the stack is not empty in the step 204, it is assumed to be the read head of the stack so=(Mσ′, q) here as shown in a step 206. A table δσ, of the element σ′ is examined (step 207), and it is determined whether there is the transition (step 208). If there is the transition (here, q transitions to q′ as shown in a step 209), the head of the stack so is replaced with so=(Mσ′, q′) (step 210). Thereafter, (Mσ, Iσ) is stacked on the stack (step 205), and it returns to the step 201. If there is the transition in the step 208, the validation is a failure (step 211).
If it is determined that the read σ in the step 202 is the closed tag, the head of the stack so is read (step 212). It is the read head of the stack so=(Mσ, q) here as shown in a step 213. It is assumed to be determined whether q is included in F94  (step 214), and the validation is a failure if it is not included therein (step 215). If it is determined to be included in step 214, the head of the stack so=(Mσ, q) is deleted (step 216). Thereafter, it is determined whether the XML document has ended (step 217), and it returns to the step 201 if it has not ended. The processing is finished if it has ended.
The above steps of the validation will be described in detail by taking the following XML document as an example.
(XML document)<doc><title></title><section><title></title><para></para></section></doc>                i) Read an open tag <doc>. Stack the automaton Mddoc and its initial state 0 in the stack. The stack is [(Mdoc, 0)].        ii) Read an open tag <title>. Examine a table δdoc by (title, 0). The state of Mddoc transitions from 0 to 1 ((Mdoc, 0) at the head of the stack changes to (Mdoc, 1)). Stack the automaton Mtitle and its initial state 0 on the stack. The stack becomes [(Mtitle, 0), (Mdoc, 1)].        iii) Read a closed tag </title>. As the state 0 of Mtitle is the final state, it is accepted. (Mtitle, 0) is deleted from the stack. The stack becomes [(Mdoc, 1)].        iv) Read an open tag <section>. Examine the table δdoc by (section, 1). The state of Mdoc transitions from 1 to 2 ((Mdoc, 1) at the head of the stack changes to (Mdoc, 2)). Stack the automaton Msection and its initial state 0 on the stack. The stack becomes [(Msection, 0), (Mdoc, 2)].        v) Read the open tag <title>. Examine the table δsection by (title, 0). The state of the automaton Msection transitions from 0 to 1 ((Msection, 0) at the head of the stack changes to (Msection, 1)). Stack the automaton Mtitle and its initial state 0 on the stack. The stack becomes [(Mtitle, 0) (Msection, 1), (Mdoc, 2)].        vi) Read the closed tag </title>. As the state 0 of Mtitle is the final state, it is accepted. (Mtitle, 0) is deleted from the stack. The stack becomes [(Msection, 1) (Mdoc, 2)].        vii) Read an open tag <para>. Examine the table δsection by (para, 1). The state of Msection transitions from 1 to 3 ((Msection, 1) at the head of the stack changes to (Msection, 3)). Stack the automaton Mpara and its initial state 0 on the stack. The stack becomes [(Mpara, 0) (Msection, 3), (Mdoc, 2)].        viii) Read a closed tag </para>. As the state 0 of Mpara is the final state, it is accepted. (Mpara, 0) is deleted from the stack. The stack becomes [(Msection, 3) (Mdoc, 2)].        ix) Read a closed tag </section>. As the state 3 of Msection is the final state, it is accepted. (Msection, 3) is deleted from the stack. The stack becomes [(Mdoc, 2)].        x) Read a closed tag </doc>. As the state 2 of Mdoc is the final state, it is accepted.        
As described above, it is verified that an XML document 1 satisfies the content model of the sample DTD.
Moreover, as the content model of the DTD is corresponding to a tree language of a so-called local class, the automaton M of each content model corresponds to the tag name σ. In the case of the document type definition having a single type constraint such as the XML schema, however, an automaton Mx of each content model does not correspond to the tag name σ. For this reason, in the case of applying the document type definition corresponding to a language class wider than DTD such as the XML schema or the RELAX, it requires an algorithm more complicated than the aforementioned validation of the DTD. For instance, in the case of the XML schema, it is necessary to calculate a table η for acquiring, from a parent content model X and the tag name σ of a child, a content model Y of the child. If the table η is acquired, the above described procedure 1 is expanded as follows. To be more specific, the tag name σ is stored in the above described stack in addition to the automaton M and the state q. The content model Y is acquired by using the table η from the tag name σ read in the step 201 and a content model x corresponding to the automaton Mx currently at the head of the stack. And a table δx is examined by (Y, q) so as to obtain a transition state q′. Otherwise, it is the same as the procedure 1. Thus, it is necessary to further use the table η in addition to the table δ in sequence to obtain the transition state q′.
Furthermore, it is necessary, for the sake of the validation of the document type definition corresponding to the tree language of a wider class such as the RELAX, to perform a more complicated validation operation such as seeing some tables in parallel.
As previously mentioned, the background art uses a plurality of transitional relationship tables δ by switching them. To be more specific, a complicated and intricate validation process is implemented. For this reason, a program size for the validation of the XML document becomes larger, so that it requires hardware resources to meet an increase in throughput of the validation (high computation speed, large memory capacity and so on). In addition, the automaton for each content model is generated from the document type definition each time the validation is performed in sequence to provide general versatility. Thus, the program size inevitably becomes larger and processing time becomes longer.
On the other hand, it is demanded that the XML document should be obtained from the Internet and so on and validated by using a small processing apparatus such as a portable telephone or a PDA (personal digital assistant). However, the validators in the past have the complicated process as previously mentioned, and so the load is excessive for such a small processing apparatus of which hardware resources are scarce. Moreover, as previously mentioned, the processing is more complicated and the problem becomes more serious in the cases of the XML schema and the RELAX than the DTD.