1. Field of the Invention
The present invention relates to a document conversion system for converting a first structured document formed by a first document schema to a second structured document formed by a second document schema, a document conversion method and a computer readable recording medium storing a document conversion program.
2. Description of the Related Art
Conventionally, the structured document which not only handles text data of text document files as mere character string but also is capable of expressing the logical structure of the document layout, attributes, etc. has been proposed. For example, SGML specified by International Standardization Organization (ISO) standard 8879 and XML specified by World Wide Web consortium (W3C) are currently available. According to the SGML and XML, the logical structure of a document is specified by document type definition (DTD) and the roles of document component elements such as title, author's name, preface and text can be expressed using identifier for structure elements called document tag.
In the structured document, specific meaning or role, etc. may need to be assigned to the identifier and additional information (attributes) can be added to the identifier to express this characteristic.
Further, the format of the stylesheet for describing the style of document, which is required for displaying the structured document on the screen and printing the structured document on paper, has been proposed. As the format of the stylesheet, for example, specification language (DSSSL) of ISO standard 10179 and extensible stylesheet language (XSL) specified by W3C are available.
DSSSL and XSL describe the document style by specifying a pattern for expressing the condition for the identifier constituting SGML or XML and an action corresponding to the identifier which satisfies that pattern.
The stylesheet provides the document style and converts the structure of the document. The specification for extracting a particular pattern of the structured document in XSL is called XSL transformation (XSLT). The use of the XSLT enable the XML document to be converted according to predetermined conditions and outputted in a different format such as HTML for example.
The structured document is produced by dividing document data (text) into units which have a meaning structurally and make these units using elements and attributes. In XML, the method for defining the structure of the document data is called schema and generally, document type definition (DTD) is used for defining the schema. The schema defines which elements should be possessed in what order and how many times as the content of the document and which attributes should be possessed as the content of the document. Since the structured document itself has no definition about data, it cannot automatically check for an error even if data is missing for some reason. Thus, document type definition is to be performed to display data or exchange data and the document needs to be described according to the definition.
FIG. 1 shows an example flow of a conventional document conversion process for the structured document F1 which is described by the XML. As shown in the figure, generally, the conversion process of the structured document is comprised of 2 steps, that is mainly conversion of document structure S101 and its validity verification process S102.
The conversion of document structure S101 is a step of generating a new document by extracting elements and attributes using a pattern matching technique and replacing them with new elements and attributes or by adding new elements, attributes and text. This process is performed based on a conversion rule described in a conversion template T1. The conversion template T1 contains a structure conversion rule which is generated as an XSL file (conversion template T1) in advance. In the meantime, as the XSLT conversion engine for the conversion of document structure process S101, the existing software (e.g., Xalan-C++) can be utilized.
The validity verification process S102 is a step of verifying whether the output (structured document F2) by the XSLT conversion process follows a document type definition D2 after conversion and is performed using the document type definition D2 after conversion. The validity verification process S102 can be performed by the existing software (e.g., XML4C). If the result of the validity verification process S102 is acceptable, a new structured document F3 is generated. If it is not acceptable, document structure correction process S104 is performed for the structured document F2 based on the error content, and the validity verification process S102 is again performed for the corrected structured document F2.
FIG. 2A is a diagram showing a conventional example for converting the structured document F1 defined by the document type definition D1 to the structured document F3 based on the conversion template T1. In the figure, the structured document F2 after a first conversion (i) is contradictory to the document type definition D2, and the structured document F3, in which the contradictions are corrected. In a document example of FIG. 2A, UL element and ul element define a statement row without any number (list in random order) and each statement item is defined with LI element and li element which are lower order of UL and ul elements.
As the element after the conversion, the ul element and li element correspond to the UL element and LI element. In the structured document F1, a list comprising three statements is described. In the structured document F2 containing contradictions, simply corresponding elements are replaced.
If such a rule that only one li element can be defined under the ul element is specified in the document type definition D2, each li element is to be a sub-element of ul element (each li element is enclosed by ul tag) in the structured document F2. Consequently, it is corrected to an appropriate structured document F3 which satisfies the document type definition D2.
FIG. 2B is an example of a description of a conventional conversion template T1. As shown in the figure, the conversion template T1 acts as a conversion rule about conversion from the structured document F1 to the structured document F2 (i) containing contradictions.
The conversion template T1 is comprised of a pattern assigning part and a template assigning part.
Through conversion process, a document pattern (tag) defined by the pattern assigning part is extracted from the structured document. Further, addition, deletion and replacement are performed to the extracted document pattern according to the template assigning part in order to generate a new document.
In the conventional conversion template T1, each of <xsl:template match>, <xsl:apply-template>, <xsl:value-of> is an element defined by the XSL specification.
(1) and (3) using <xsl:template match> mean specifying the pattern and (1) means extraction of the UL element while (3) means extraction of the LI element. (2) and (4) mean specifying the template. The UL element is extracted according to the pattern specifying of (1) and then the template of (2) is specified.
The specifying template of (2) means describing the start tag of ul and describing the termination tag of ul after process of applying a template rule to the LI element is performed. The template rules for the LI element are (3) and (4), and the LI element is extracted according to the pattern specifying of (3). Further, as the template specifying of (4), the start tag of li is described, a portion under the LI element is converted to text and finally the termination tag of li is described. Since there are three LI elements in the structured document F1, three portions corresponding to the pattern specifying of the above (3) are extracted. Further, the template specifying of (4) is applied respectively and then the process is complete.
However, as described above, in a case where the document type definition D1 contains a contradiction with the document type definition D2 (e.g., specification which is inhibited in the document type definition D2 ), if only extracting elements/attributes according to the conversion template T1 and replacing (converting) to corresponding elements/attributes or adding such elements/attributes is performed, a contradiction with the document type definition D2 remains.
According to the conventional structured document conversion method, both the document structure conversion process S101 and the validity verification process S102 search elements/attributes from a route element to an end element in the document data. Therefore, there is a problem that the conversion of document takes longer time as the required times of the document correction process S104.
Further, there is a problem that if a result of the validity verification process S102 is not acceptable, an operator must manually perform a document correction process S104 in an off-line state based on the result of the validity verification process S102.