1. Field of the Invention
The present invention relates to systems and methods for Scanning Documents.
2. Description of the Related Art
Performing Optical Character Recognition (OCR) on scanned document images is important for many tasks, such as digital library construction and high-quality document display. Because a document often consists of both images and text, OCR requires extracting the text from the mixed document before the recognition process can begin. The prior art decomposes the document into areas of text and images, the layout, and then performs pattern recognition in the individual areas. These prior art layout extraction methods typically assume that the background of the document that was scanned is clean and homogeneous in color. Thus, the structural information of the text and image areas can be directly computed. However, a scanned document image does not necessarily have a clean background. For example, a magazine page with a colorful background and a pattern. Even if the background appears homogeneous, it may include halftone textures that are an artifact of the printing method. Therefore, it may be difficult to extract text layouts from document images of these types. It is the goal of the present invention to address these issues with the prior art.