1. Field of the Invention
This invention relates to the field of document image processing systems, and more particularly relates to optical character recognition (OCR) systems. The invention further relates to a method and system which stores characters, tables, and figures in separate files. The invention further relates to the rearrangement of sections of an article in order to more easily utilize the information in the article and to have efficient use of paper.
2. Discussion of the Background
Various aspects of recognizing features of an image of a document by a computer implemented process are known. For example, a method of extracting character images from a document image inputted as a bit-mapped image is proposed in Japanese patent application 4-17086, which is incorporated herein by reference. Also, a method to recognize handwritten characters and a method to vectorize figures is proposed in Japanese application 4-165477. As another method of recognizing character images, a method to recognize separately the areas made of characters, photographs, figures and table images, respectively, and identify lines of characters which appear on the top of the table images constituting the title for the table and below figures or photographs constituting explanations of the figures and photographs is proposed in Japanese application 4-287,168 which is incorporated herein by reference. However if one wants to utilize data from a particular document obtained through a character recognition process, different types of software are needed to handle the respective data such as word processing software for handling character data and other software to handle the processing of figures. Therefore, even if data of an original document can be recognized as containing characters, figures and table images, there remains a problem that it is not easy to utilize this data with different types of software if this data is only stored in one file.
Further, when articles processed by the optical character recognition software are from a newspaper or from magazine, articles which are often made of blocks of sub-articles extend over multiple columns. In such cases, the shapes of the areas containing the article are rather complex and there exists a problem that the data so arranged cannot be efficiently used and also wastes paper space.