1. Field of the Invention
The present invention relates to the field of optical character recognition systems and, more specifically, to a method and apparatus for performing optical character recognition of printed text.
2. Prior Art
A number of optical character recognition (OCR) systems are known in the art. Typically, such systems comprise apparatus for scanning a page of printed text and performing a character recognition process on a bit-mapped image of the text. The characters may then be stored in a file on a computer system for processing by a word processor or the like.
Some known OCR systems comprise a hand held scanner for scanning a page. In such systems, the individual performing the scan, sweeps the hand held device over printed text on the page and will normally avoid scanning of graphics or non-text portions of the page. Normally, the page is scanned in the order in which it is normally read (i.e. the scan is performed down columns, scanning across columns from left to right).
Other known systems comprise a ruler apparatus which may be utilized for measuring or indicating portions of the text which are to be processed by the OCR system. Some of such systems are capable of discriminating graphic portions of the indicated page areas from text portions. However, such a system still requires manual intervention to mark off text in the order it is normally read and to mark off graphics portions.
Other systems utilize a registration mark to indicate the beginning of columns of text. These systems still require manual intervention to add registration marks.
Therefore, as one object of the present invention, it is desired to develop an optical character recognition method and apparatus which allows for scanning of a page of text without requiring manual intervention to mark off columns or otherwise indicate the normal order in which the text will be read. Further, it is an object of the present invention to develop an optical character recognition system which allows for a page of mixed text and non-text images to be scanned and for the system to recognize and distinguish between text and non-text for purposes of processing.
Known optical character recognition systems may be generally divided into two categories. Optical character recognition systems in the first category recognize either a single font or a limited number of fonts and their input is usually restricted to monospaced type of a specific point size. Optical character recognition systems in the second category are typically termed omnifont systems. Such systems are capable of recognizing a large number of typefaces in a wide range of point sizes, either monospaced or proportionally spaced. In general, optical character recognition systems which recognize a large number of typefaces are not capable of processing documents as quickly as systems which recognize a limited number of specific fonts.
It is another object of the present invention to develop an optical character recognition system which allows for recognition of any number of typefaces while still allowing for the rapid processing of documents.
These and other objects of the present invention will be described in more detail with reference to the Detailed Description of the Present Invention and the accompanying drawings.