The present disclosure relates to a technique of transforming a structured document. More specifically, the present disclosure relates to an apparatus, method, and program for supporting generation of a transformation rule, capable of compact graphical representation of a transformation rule for transforming a structured document having a hierarchical structure based on a physical disposition into a structural document having a hierarchical structure based on a logical structure of data content.
Analyzing a design document or a specification document written as a general-format document such as an Office document involves preprocessing in which a document file to be analyzed is dumped to obtain a structured document having a hierarchical structure based on a physical disposition (syntax), which is then transformed into a structured document having a hierarchical structure based on data content (semantics). Accordingly, in the analysis of a project-specific specification document or design document, operations of defining transformation rules adapted to the document to be analyzed are necessary.
Such transformation can be said to be transformation from what does not have a schema into what has a schema, and this characteristic imposes the following requirements for the transformation.
(1) It is desired that information described at different locations or in different manners in the original document should be output as information of logically the same type.
(2) It is desired that information described at one location in the original document should be output separately as logically different pieces of information.
These requirements are especially imposed when the document to be transformed is a word processor file written in project-specific format and notation, or a spreadsheet file in which the document is laid out in spreadsheet cells. The requirements are represented as a transformation rule for outputting a plurality of different elements in the transformation-source structured document as a plurality of elements of the same type in the transformation-target structured document, and a transformation rule for outputting one element in the transformation-source structured document as a plurality of different elements in the transformation-target structured document. This leads to redundant description.
Various techniques exist for transforming a structured document such as a document written in XML (Extensible Markup Language). For example, XSLT (XML Stylesheet Language), which is a standard language for transforming an XML document into another XML document, is used to describe transformation rules for transforming the structure of an XML document into another form (for more details, see http://www.w3.org/TR/xs1t20/). MOF (Meta Object Facility) QVT (Queries View Transformations), which is a model transformation standard in a model-driven architecture, defines a standard technique for transformation from a source model into a target model.
The above existing techniques are defined as transformation languages for describing transformation procedures and rules, and are capable of describing transformation in various manners. However, operations of defining transformation can be said to be a kind of programming, which is difficult to master for those who are not expert engineers. In order to address this, techniques (graphical transformation languages) and tools exist for graphically describing transformation procedures and rules. Examples of such techniques and tools include UMLX, and xsl:easy from SoftProject GmbH. UMLX is a graphical description technique for model transformation (for more details, see http://www.eclipse.org/gmt/umlx/doc/), and xsl:easy is a tool for visually designing transformation of an XML document (for more details, see http://xsl-easy.com/4.0/). Advantageously, such techniques and tools are easy to intuitively understand, thereby lowering a barrier of skills required of users.
Other conventional art found in prior-art investigation for the present disclosure includes the following.
JP2006-139441A discloses a document transformation apparatus for transforming information in an untransformed document A into information in a transformed document B, the apparatus including: an input device that reads the document A and the document B; a user interface device that displays items in the document A and items in the document B to manipulate mapping between the items; and a transformation device that reads information on the mapped items, transforms the information in the document A into the information in the document B, and outputs the transformed document (see claim 1 in JP2006-139441A). JP2006-139441A also discloses that one of the documents A and B is a text document and the other is a structured document (see claim 2 in JP2006-139441A). JP2006-139441A further discloses that the mapping between the items in the document A and the items in the document B may be one-to-one, one-to-many, many-to-one, or many-to-many mapping (see claim 6 in JP2006-139441A).
JP2001-344230A discloses a multimedia presentation generation system including: style editing means and mapping rule editing means, serving as a mechanism by which a template description format is separated into a style that specifies a presentation method and a mapping rule that sets mapping between the style and a logical document, and the style and the mapping rule are individually edited; and generating means for generating presentation from the style and the mapping rule generated by the respective editing means. JP2001-344230A also discloses, for the mapping between the logical document and the style, notation that maps one logical document element to a plurality of style elements (see FIG. 5 in JP2001-344230A). As a processing method for the system, JP2001-344230A discloses searching for style elements specified for each logical document element and mapping them (see FIGS. 9 and 7 and paragraph [0023] in JP2001-344230A), and substituting values based on the search result (see FIG. 10 in JP2001-344230A).