This invention relates to an image processing apparatus and method for performing image processing such as character recognition based upon an input image.
There is increasing use of software that makes it possible to perform optical character recognition (OCR) by personal computer so that image data inclusive of document image data entered by an image reading device such as an image scanner or facsimile machine can be recognized. An example of such OCR software known in the art is OmniPage Pro 6.0J, which Caere Corporation made available for sale in November of 1996. This is OCR software for Windows 3.1 or Windows 95 and supports documents in both the English and Japanese languages.
By running this OCR software on a personal computer to apply character recognition processing to image data that includes document image data, text included in the image data can be converted to character codes. That is, the entirety of the original image data inclusive of text is split into areas such as a text area, image area, table area and line-drawing area. By applying character recognition processing to the image data within the text and table areas, textual portions contained in the image data can be converted to character code data. The other areas can be left in the form of bitmap image data. Then, as by using the Rich Text Format, a file holding the layout of the text and image areas and the format information is generated.
More specifically, original image data containing text captured by the image reading device is displayed on the monitor of a personal computer. At such time the left half, for example, of the display screen is used to display the image that has been read.
Next, the personal computer performs processing to partition the image into prescribed areas and causes the monitor to display a document image in which each area (block) is enclosed by a border.
Next, the personal computer subjects textual areas and table areas to character recognition and causes text data, which results from this character recognition, to be displayed on the other half of the monitor screen, e.g., the right half. The text being displayed in the window that displays the text data generally is capable of being edited. This editing is different from ordinary editing. That is, when a character being displayed in the text window is clicked on using a mouse, the corresponding character image and character candidates from second-ranked candidates onward resulting from character recognition are displayed. By selecting a character candidate, the user can change the character currently being displayed, namely the first-ranked candidate, to the selected character. This function is an editor (referred to as an OCR editor) that makes possible revisions specific to OCR.
After character recognition processing is completed, the text can be preserved in a Rich Text Format (RTF) file. At such time, image areas other than text areas can also be preserved in the RTF file. These areas are preserved in a data structure representing a layout almost the same as the layout of the original image containing the text. If an RTF file having this data structure is read into document processing software such as Microsoft Word, for example, a document file in which textual portions have been converted to character codes can be edited on a screen displayed in a layout almost the same as that of the original image.
With the OCR software described above, however, the original image is displayed on the display screen only one page at a time. The image of the page desired to be edited is displayed on the left half of the screen, and the text window is displayed on the right half of the screen. As a consequence, the user edits the text while observing a display in which one page of an image of interest is displayed on both the left and right sides of the screen.
This editing operation is not troublesome if the document image that has been read in consists of one page. However, in a case where a plurality of pages are to be subjected to OCR processing, particularly a case where editing is performed while making cross reference to a plurality of pages, the fact that these pages cannot be observed on the screen simultaneously places an excessive burden upon the user and results considerable inconvenience. Problems also arise in terms of the ease with which pages can be moved and copied on a per-page basis.