Development is under way of an OCR which processes the image of a printed document and codes the document contents through character recognition to read them. For this type of OCR, it is a known method to analyze the image of a document to segment it into objects such as character strings, graphics, and tables, and structure the data for the objects as a tree structure hierarchically expressing the layout relationship between the objects. For example, the official gazette of PUPA No. 2-59880 discloses a method for structuring the objects constituting a document and the layout relationship between the objects as a tree structure in accordance with the inputted document image and read characters in a desired area from the present document image by specifying an object area whose layout is displayed in accordance with the tree structure.
The official gazette of PUPA No. 3-17771 discloses a method for generating a document in which character information and image information are laid out as a tree structure of layout objects in order of a block, frame, page, and page set from the bottom by a document processor. This method makes it possible to edit a document covering different objects by specifying an area to edit the document, generating a frame equivalent to the specified area, detecting layout objects in the specified area, generating a new layout object equivalent to an area combined with the specified area, and connecting the new object to the lower rank of the generated frame. These methods lay out each object of a document image by using a hierarchical tree structure. However, the type of document is restricted to which prepared tree structure or layout form can be directly applied. To form a new tree structure or layout model each time, it is necessary to define a complex hierarchical structure and, moreover, it is difficult to intuitively understand the hierarchical structure. Therefore, this is not easy for general users.