1. Field of the Invention
The present inventions relate to methods and apparatus for analyzing images, for example analyzing scanned images, such as for identifying text, enhancing images, compression and increasing data throughput.
2. Related Art
Electronic processing of images has become commonplace. Images fixed on permanent media such as newspapers, magazines, books and photographic paper are used in many aspects of daily life. Images created electronically, such as by computer animation, digital cameras, word processors and other devices for creating graphics are also very common. Additionally, images fixed on permanent media can also be converted to electronic form in a number ways, including scanning, digital photographic imaging, and the like.
Images converted from permanent media form to electronic form are now is converted so as to be identically reproduced individual form so that any display of the electronic image is identical to the original permanent medium. Even slight differences are often noticeable to the human eye. Additionally, digitally-created images may sometimes be processed in such a way that information is lost or modified. In some situations, it may be desirable to process electronic images to improve their appearance, to change how they are manipulated in various processors or peripheral equipment, or to change how they are stored.
When digital images are produced by scanning, such as in a flatbed or feed scanner, the scanned images can be stored in any number of different formats, such as bitmaps, JPEG files, GIFs, and the like. The storage format may often be determined by the ultimate destination for the information. For example, information incorporated into a Web page may be stored in a different format than information incorporated into a word processing document, which may be different from the storage method for use in an audiovisual presentation. Additionally, information that is received only in all text form, or in text form combined with graphical or pictorial images, may be sent to a word processing application for editing.
In many instances, the destination for a scanned image determines how the image is initially scanned, such as the scan settings. For example, if an image is text only, the scan can be set to a low bit depth and high-resolution so that the image is best suited for Optical Character Recognition (OCR), reproduction and printing. For a graphical or pictorial image, the scan settings are more often set for a high bit depth and lower resolution. Therefore, for a text-only document being put into electronic or digital form, for subsequent editing, the scan settings should be a low bit depth and high-resolution. Before a preview scan of the image, and at least before any final scan, the scanner should be set at 300 dpi and black and white. The resulting image can then be processed, such as de-skewing, auto cropping and OCR.
Many image scanners include a user interface by which the user can select the desired settings. If the necessary settings are known and can be easily applied, the desired image data should be successfully received for later processing. However, if the proper settings are not made, the resulting digital data most likely will not be in the appropriate format for the desired end use of the data. For example, an image ultimately intended to be retrieved as an editable text document that is scanned with a low resolution and a high bit depth will not produce a data file that can be suitably processed through OCR.
Scanned images are often processed after scanning to make the images appear more like the original document. For example, a scanned text document which is intended to be displayed only as a picture or graphic depiction of the original may depict the text on a light gray or slightly yellow background because the digital data representing the background is not always given or assigned a zero value or other numerical value representing 100 percent white. Therefore, the image will not appear like the original. To improve the appearance, the image data file is processed to bring the background closer to white. Additionally, the image data file may be processed to make the text appear sharper. However, if the correct settings are not applied to the scanner, or if the proper destination for the digital data is not selected, the desired processing may not be carried out on the image.
Different hardware and scanners and different environments produce different scan results for a given image. For example, different digital values can be assigned to all black and all white pixels. Consequently, the point at which a pixel will be treated as white or as black may cause some pixels to be identified as black or white and other pixels to be a shade of gray. With color scanners, detected colors, including black and white may vary as a function of temperature and ambient light. An all black and white image may be converted to digital data that would be displayed with the light gray or light yellow background. Additionally, parts of the black text may be depicted as dark shades of gray. Consequently, if the image is not properly characterized as black text, the image may not be properly processed, and it would not be displayed in such a way as to look like the original.