The present invention relates to optical character readers (OCR) and, more particularly, to OCR page readers.
Various OCR systems have been devised in the past for reading alphanumeric characters fixed on a page. Typically the page is scanned by moving the page past an optically-sensitive device, e.g. a photodiode array. As a result sophisticated, heavy and expensive paper handling equipment is needed, especially if the pages are to be automatically fed into the OCR system from a stack.
Since an OCR system usually has a lens for focusing the image of the page on the optically-sensitive device, there is a problem with obtaining a uniform amount of light to that device from various portions of the page because of the cos.sup.4 .theta. law. One method of overcoming this problem is to have the illumination move across the page as the scanning of the page proceeds. An alternative method trys to correct the problem by varying the amount of light over the page to compensate for the light intensity variations caused by the lens. Neither method has solved the problem adequately. The moving light solution also increases the mechanical complexity of the machine.
When photodiode arrays are used in OCR systems it is necessary to compensate for the variation in sensitivity between the individual photodiodes of the array. This has been done by calibrating the array at the factory and supplying it with a memory circuit with correction factors for each cell. However, such an arrangement would in no way compensate for illumination variations caused by the lens and lighting arrangement of the OCR system in which it is used or for variations in the reflectivity and color of the paper used in such a system. It has been proposed to compensate for paper changes by storing in a memory a number representing the maximum white level (i.e. paper background level) obtained by the entire photodiode array. This number is then incremented or decremented depending on whether the maximum signal received from the photodiodes of the array, within a particular period, represents a white level greater or less than that for the number which had been stored. These techniques, however, have not proved satisfactory in overcoming the various illumination problems in OCR systems, especially where the illumination is not uniform across the entire page.
In recognizing a character a signal from the optically-sensitive device is compared to a threshold in order to determine if that signal represents white, i.e. background, or black, i.e. part of a character. This information is then stored in a memory as rows of data bits corresponding to the scans of the page being read. In a typical system a section of the resulting memory array is analyzed to determine what character it represents. This is accomplished by a technique known as video framing in which a rectangular box is considered as being around the character under examination. This box has dimensions large enough to enclose the largest character expected. Once the character has been framed by locating the perimeter of the character bits in the memory that form it, the contents of the frame are passed to a second stage of recognition which identifies the contents of the box as a character. The determination of what character is in the memory section being analyzed is made by a plurality of separate circuits, each looking for the particular pattern of data bits that represents a unique character. In more sophisticated systems, e.g. those which can recognize hand-printed characters, patterns of interconnected and properly positioned character features are looked for in order to determine the character in the frame under examination. With a video framing system, however, additional logic is needed to locate the character in the memory section and to establish the frame about it.