Conventional techniques related to document image processing include: copy machines for optically inputting a document image and outputting it by printing the entire image; document database systems for optically inputting document data and storing the document data; facsimile apparatuses for optically inputting a document image and outputting the document image via a network or communication line; optical character readers (OCR) for optically inputting a document image and outputting text codes by recognizing characters; and so on.
However, the conventional techniques are no longer applicable to digitized or networked machines. More specifically, because of the fact that a network is employed for connecting an input apparatus with an output apparatus and that color documents are handled by these apparatuses, the following problems arise:
(1) The amount of data is too large when an inputted document image is stored or transmitted without any processing;
(2) Image quality suitable for reuse cannot be maintained if a document image is uniformly compressed;
(3) Quality of an output image may deteriorate depending on whether the output device is a black-and-white printer or a color printer;
(4) If texts only are transmitted after performing optical character recognition (OCR) processing, data such as drawings or photographs is lost; and
(5) If erroneous recognition is made by an optical character reader (OCR), the document may not make sense.