1. Field of the Invention
The present invention relates to document image acquisition, and particularly to ensuring that the acquired image data will be of high quality and a resolution suitable for the content of the image, even if the image contains text together with halftone (grayscale levels) or color image, or both.
2. Description of the Related Art
As increasingly larger storage devices have become available, it has become possible to store a document not simply as ASCII text but also as a full facsimile image of the document. More specifically, it is now commonplace to convert a document into a computer-readable bit map image of the document and to store the bit map image. Accordingly, whereas ASCII text storage permitted storage and display of only text portions of documents, it is now possible to store a document in computer readable form and to display not only the text but also pictures, line art, graphs, tables and other non-text objects in the document, as well as to show the text in the actual font and style used in the original document. Likewise, it is possible to store and display documents such that text attributes, such as size, position, etc., are preserved.
FIG. 3 shows a page of a representative document. In FIG. 3, a document page 40 is arranged in a two-column format. The page includes title blocks 41, 42, 47 which include text information of large font size suitable for titles, text blocks 43, 44, 48, which include lines of text data, graphics blocks 45, 46 which include graphic images which are not text (in this example, they are a line drawing and a full-color image), a table block 49 which includes a table of text or numerical information, and a caption block 50 which includes small text data and which is a caption associated with blocks of graphic or tabular information.
Despite the technical advances mentioned above, however, it is still difficult to store document images in computer memory efficiently, because of the large amount of information required for even one page. For example, at 300 dots-per-inch resolution, an ordinary 8xc2xd by 11 inch black and white document requires approximately 8.4 million bits to store a full document image (assuming that only one bit is used per dot, which is possible with monochrome text and line drawings, but not with images containing grayscale image or color image portions). Adding grayscale image or color to the document, or increasing the resolution at which the image is stored, can easily increase storage requirements to many tens of millions of bits per page. Moreover, the time required to retrieve those bits from storage and to create and display the resulting image is significant, even with current high speed computing equipment. The time is lengthened even further in situations where the document image is retrieved from storage in a first computer and electronically transmitted, by modem, for example, to a second computer for display on the second computer.
It has been conventional to scan a document combining black and white text with color image or grayscale image, or both, in a PC-based document management system using only a black and white (bi-level) scanner. Many disadvantages are attendant upon this approach, however.
First, scanning a color or grayscale image in black and white scanning mode not only loses all the hue. information of a color original and the gradations in density of both color and grayscale images, but in many cases results in a mere conglomeration of black blobs. Text and line drawings scanned in a grayscale or color mode, on the other hand, become very blurry, and characters scanned in that fashion are not legible to optical character recognition processing (xe2x80x9cOCR processingxe2x80x9d).
Moreover, even color scanning a grayscale image often produces unacceptable results. Although a color scanner should pick up the densities in a grayscale image well, inadequacies in the scanner may result in some xe2x80x9ctint taintingxe2x80x9d of the grayscale image data. That is, although the grayscale image is made up entirely of black, white and shades of gray and so has no chrominance or hue, the scanner may erroneously detect a slight hue in the grayscale image. This is because the color scanner cannot directly detect a gray value as such, but can only detect three predetermined primary colors, typically red, green and blue. When scanning an achromatic point, such as a point that is pure black, white or gray, the color scanner should detect exactly equal values for these three color components. In practice, however, slightly different values for the three color components may be detected, due to scanner inadequacies. Upon display or reproduction, the point will have a slight hue instead of being achromatic as it should be.
Thus, using one type of scanning for an entire document that includes color, grayscale or both, in addition to text, is not a viable approach.
Also, with document images (as opposed to text documents created locally in ASCII code using a word-processing program to begin with), it has been proposed to subject text portions of a document image to optical character recognition processing and to store the character information so obtained in ASCII form, greatly reducing the amount of storage required for the text portions. This technique, however, does not preserve any information regarding the type font used in the original document, and obviously is not applicable to non-textual portions of a document, or even to textual portions which are not in a font recognizable by the particular OCR process being employed.
The growing importance of desktop publishing in the business world only makes the problems described above more urgent. This technique has come to depend more and more heavily on scanning as a way to capture material, that is, of entering text, color images and grayscale images into a form usable in a desktop publishing system.
It is one object of the present invention to provide an apparatus and method for processing a document so as to capture or acquire the contents of the document and to store those contents for future retrieval, with reduced memory capacity requirements.
It is another object of the invention to provide an apparatus and method for processing a document to capture and store the document contents in such a manner as will permit convenient and quick retrieval of the document for display or other processing at a later time.
It is still another object of the invention to provide an apparatus and method for processing a document to capture the document contents in such a manner that text, line drawing, grayscale and color portions are each treated in a way suitable for each of these image types, and such as to prevent degradation in image quality resulting from the processing and storage of the information.
It is yet another object of the invention to provide a single-board document scanner which meets the foregoing objects, and a document image management system using such a scanner.
It is another object of the invention to provide a method and apparatus, and in particular a scanner, which meet the foregoing objects and are suitable for use in connection with, or as part of, a desktop publishing system.
In a first aspect, the foregoing objects are achieved by providing an image scanning method and apparatus, which may be either an individual scanner by itself or a more elaborate apparatus or document image management system including the scanner, using first and second sensors, and a control system, and in which the control system effects a first scan of an image, using the first sensor, and then a second scan, using the second sensor.
In another aspect, the foregoing objects are achieved by providing an image scanning method and apparatus, which may be either an individual scanner by itself or a more elaborate apparatus or document image management system including the scanner, using a sensor system, which may be either one or plural sensors, and a control system, and in which the control system effects plural successive scans of an image, to provide successively a combination bi-level, grayscale and color data as needed.
In another aspect, the foregoing objects are achieved by providing a scanning method and scanner or larger apparatus including such scanner, using first and second sensors, a detector which detects image type based on the image data itself, and a control system. In this aspect of the invention, the control system causes a first scan of the image to be carried out using the first sensor, and then a second scan, responsive to detection that image content of a particular type is present in the image. The second scan is carried out using the second sensor.
In another aspect of the invention, these objects are achieved by providing a method and a scanner and an apparatus or system incorporating the scanner, using first and second sensors, a memory and an analysis and control system. In this aspect of the invention, the analysis and control system itself detects image type based on image data obtained using the first sensor.
Upon detection of image content of a particular type in at least one portion of the document, the image is scanned using the second sensor. Additionally, the information obtained from the first scan is stored in the memory, after which information from the second scan is stored in the memory, only for those portions of the image identified as being of the particular image type.
According to another aspect of the invention, these objects are achieved by providing a method and scanner and a system and apparatus incorporating such scanner, using first and second sensors, a display and a control system, in which information obtained by scanning the image using the first sensor is displayed, after which a second scan is performed using the second sensor, responsive to entry of an instruction by an operator for such second scan.
In another aspect of the invention, the foregoing objects are achieved by providing a method and a scanner and an apparatus and system incorporating the scanner, using first and second sensors, and an analysis and control system in which image information obtained from a first scan of the document using the first sensor is analyzed to identify portions of the image as having various image types. Also, according to this aspect of the invention, a determination is made that image content of first and second types is present in at least first and second respective portions of the document, and a second scan is performed, in which the second sensor is used. In addition, in this aspect of the invention, the information obtained in the first scan is initially displayed, and after the second scan, information from that scan is used in the display, but only for those portions of the image identified as being of the second image type.
According to still another aspect of the invention, these objects are achieved by providing a method and a scanner and an apparatus system incorporating the scanner, using first and second sensors, a memory and an analysis and control system, in which data obtained by scanning the image with the first sensor is used to identify portions of the image as being of various image types. A second scan is performed, using the second sensor, responsive to a determination that image content of a particular type is present in at least one portion of the document. Moreover, image data obtained by the first sensor is stored in the memory initially, and thereafter information obtained by the second sensor is stored in the memory, only for those portions of the image identified as being of the particular image type. According to this aspect of the invention, the image data stored in the memory in the form of respective bit maps for respective portions of the image, and those bit maps are linked in the memory.
According to still another aspect of the invention, these objects are achieved by provision of a method and a scanner, and of an apparatus and system including the scanner, using a color sensor, a memory and an analysis and control system, in which a scan of the image is performed using the color sensor, after which, responsive to detection that the image contains grayscale image in at least one portion, the color image data obtained for that portion is converted to grayscale data. Also, according to this aspect of the invention, information obtained by the color sensor is stored in the memory for non-grayscale portions of the image while the grayscale image data is stored for those portions identified as being grayscale image.
According to another aspect of the invention, the foregoing objects are achieved by providing a method and a scanner and apparatus or system incorporating the scanner, in which portions of a document are identified as being of respective image types, and image data representing the document is stored in a memory, and in which the image data is organized in a set of linked bit maps each containing information of only one image type and pertaining to only one of the identified portions of the document.
These and other objects, features and advantages of the invention will be more fully understood from the following detailed description of the preferred embodiments, taken in conjunction with the accompanying drawings. In the drawings, it is to be understood that like elements are indicated by like reference characters.