OCR (Optical Character Recognition) is a technology that acquires image data by reading a document using a scanner or the like and recognizes characters by processing the acquired image data. In OCR, it is necessary to recognize characters not only from a document containing only characters but also from a document containing a mixture of characters, pictures, photographs, etc. Due to the need to recognize characters with high accuracy from various kinds of documents, the OCR process is becoming complex and takes a long time to accomplish.
Japanese Laid-open Patent Publication No. 2011-191903 discloses an information processing apparatus comprising a CPU, a sequential processing unit, and a parallel processing unit. The information processing apparatus checks in advance, for each specific operation to be executed in image processing, as to which of the processing units, the CPU, the sequential processing unit, or the parallel processing unit, can perform the operation at the fastest speed, and prestores a table indicating which operation is to be assigned to which processing unit. Then, for each operation to be executed, the CPU refers to the prestored table and selects the processing unit to which the operation is to be assigned.
Japanese Laid-open Patent Publication No. 08-315159 discloses an image processing apparatus which performs processing for character recognition and processing for compression. The image processing apparatus divides a document containing a mixture of text, graphics, and pictures into a plurality of regions by recognizing the attributes of image data (title, text, graphics, and pictures). Then, character recognition is performed on the regions containing characters such as title and text, by using binary data obtained by simple binarization. In Japanese Laid-open Patent Publication No. 08-315159, it is described that, for the regions containing characters such as title and text, compression is applied to the binary data obtained by simple binarization, while for halftone regions such as pictures, photographs, etc., compression is applied to halftoned binary data binarized by an error diffusion method.