1. Field of the Invention
The present invention relates to a document structure composing apparatus for composing document structure, particularly relates to a document structure composing apparatus for composing document structure which meets the structural constraint of a specific document class.
2. Description of Related Art
A document having logical structure composed of some components such as a chapter, a paragraph, a section and others is called a structured document and it is known that the sharing and conversion of a document can be facilitated by introducing the structure into a document. An international standard such as Standard Generalized Markup Language (SGML) and Office Document Architecture (ODA) is popularized and a structured document is being the mainstream of an electronic document.
This structured document is normally structured according to a classification called a document class in which the structural constraint of the structure and components of a document are defined. In ODA, generic logical structure corresponds to the document class and in SGML, data type definition (DTD) plays the role of the document class.
It includes important meaning that the document structure of a structured document complies with the constraint of a document class. For example, a rule for laying out structured documents often premises that document structure and its components meet the constraint of a specific document class. Therefore, if document structure deviates from the constraint of a document class, correct layout cannot be output.
Many programs for processing many structured documents such as listing abstracts out of a group of reports utilize a target document being composed according to a specific document class. When such a program is used, the existence of a document which does not comply with a specific document class may prevent the execution of the program. Further, in a database which deals with structured documents, a document class is often utilized as a schema and the existence of data which deviates from a schema greatly deteriorates the responsibility of a database.
From such a viewpoint, a request for adapting the structure of a document to a document class and effectively utilizing the advantage of a structured document is made. A request for converting a structured document adapted to a document class A to a structured document adapted to another document class B or converting a document composed by a flat text without a specific document structure to a structured document adapted to a specific document class is also made to reuse a document.
Paper is still used in many cases as a medium for conveying a document. Therefore, there are also many needs for receiving the benefit of structurization by converting a document including an image on paper to a structured document according to a specific document class.
However, work for adapting the structure of a document to a specific document class sometimes imposes a load upon a user.
It is not difficult so much to create a new document according to a specific document class. However, the conversion of an existing document to a structured document according to a specific document class includes a large problem.
Document data and document image conversion consume time. The quantity of data is often enormous and it is difficult to perform conversion only by manual methods. In addition, a lot of costs should not be expended because new information is not produced by this conversion.
There are some techniques required to solve such a problem and to automatically convert an arbitrary document to a structured document. Assuming that the structure of any document can be expressed by tree structure, it will be described below.
First, a method of converting a structured document according to the document class A to a structured document according to another document class B can be facilitated by applying the concept of "a fall back class" described on pages 209 to 216 of No. 4 in Vol. 5 of Editing Structured Documents--Problem and solutions (Electronic Publishing) written by Messrs. Fred Cole and Heather Brown. That is, it is defined as a rule to which type of a node defined in the document class B the type of a node defined in the document class A is converted. When conversion is executed, each node included in a document structure according to the document class A is converted to each node included in another document structure according to the document class B one by one based upon a predefined rule.
A method of converting a document composed by flat text to a structured document is disclosed in Japanese Published Unexamined Patent Application No. Sho 63-286963. That is, the title of a chapter or a section, and a paragraph in text are determined based upon the characteristics of a character string in flat text and a structured document can be composed based upon them.
If a document including an image is converted to a structured document, a method disclosed in Japanese Published Unexamined Patent Application No. Hei 6-214983 can be used. That is, the title of a chapter or a section, and a paragraph, a header and a footer in text are determined based upon the characteristics of layout and a structured document can be composed based upon them.
However, there is a problem in the above any technique that it is difficult to supplement a node and a subtree both required for the structural constraint of a desired document class if conversion to various document classes has to be executed.
In conversion applying a fall back class, the structure of a converted document does not always comply with the constraint of the document class B. Only nodes which exist in document structure before conversion also exist in document structure after conversion. That is, a node which does not exist in document structure before conversion cannot be generated and a node the corresponding node of which does not exist in document structure before conversion is dropped out in document structure after conversion. The shortage of nodes often causes the constraint of a document class not to be met. Then, this method cannot be practically used.
In document conversion disclosed in Japanese Published Unexamined Patent Application No. Sho 63-286963, a node which can be created is only a node which exists in text and has a characteristic character string which can be used to determine structure and a predetermined node (that is, a coded node in a processing program the type and the complemented position of which are predetermined) required for connecting the above nodes into a structure.
In document conversion disclosed in Japanese Published Unexamined Patent Application No. Hei 6-214983, a node which can be created is only a node which exists in a document including an image and has characteristic layout which can be used to determine structure and a predetermined node (that is, a coded node in a processing program the type and the complemented position of which are predetermined) required for connecting the above nodes.
That is, in the above document conversion, flat text and a document including an image can only be converted to a structured document which completely meets the structural constraint of the document class which is premised by the program used in conversion.
A method of complementing a node and a subtree in accordance with a desired document class is disclosed in Japanese Published Examined Patent Application No. Hei 6-12542. A function for automatically complementing a required node when it finds lack of a node by matching a part of structured document with the definition of the document class if the part of structured document is copied in another part is disclosed in the Japanese Published Examined Patent Application No. Hei 6-12542.
However, the method of automatically complementing a node and a subtree in document structure by matching with the definition of a document class includes the following great problems:
A first problem is that it is not considered that complementation in an area in document structure has an effect upon whether another area of the document structure is suitable for the document class or not, or what complementation is to be performed in another area and vice versa. That is, complementation is to be performed in document structure not based upon only a specific local area in the document structure but based upon the structure of the whole document.
A second problem is that a user has no way to select one method from plural (sometimes infinite) methods of complementing which are suitable for a document class. That is, as a method of complementing cannot be selected by a user, the user has no means for preventing the result of bad complementation from being output.
For a method of solving the above problems, a method of a user himself/herself editing document structure and adapting it to the constraint of a desired document class can be also naturally considered. However, it is work requiring a heavy load to determine a part in which a node is to be supplemented after the structure of the whole document is grasped. Even if a location to be complemented is successfully found, it is very difficult to suitably complement so that the complementation meets the constraint of a document class. It is actually impossible to apply this method to a long document and many documents.