Optical character recognition (OCR) is the process of converting scanned, photographed or other bitmap-formatted images of text (printed, handwritten, typewritten or otherwise) into machine-encoded text that can be read and manipulated by a computer. OCR is a common method of digitizing printed texts so that they can be electronically edited, searched and stored more compactly. OCR is used in various fields including, for example: machine translation, text-to speech synthesis, key data entry and extraction, text mining, book scanning, and assistive technology for low-vision and blind individuals. In particular, OCR technology offers low-vision and blind individuals the capacity to access textual content in images by means of magnification devices and devices providing an audio or Braille output.
Low vision may be generally referred to as a condition where ordinary eye glasses, lens implants or contact lenses are not sufficient for providing sharp sight. The largest growing segment of the low-vision population in developed countries is expected to be people aged 65 years old and older. This is mainly due to age-related eye diseases such as macular degeneration, glaucoma and diabetic retinopathy, cataract, detached retina, and retinitis pigmentosa. Some people are also born with low vision. Low-vision individuals often find it difficult, if not impossible, to read small writing or to discern small objects without high levels of magnification. This limits their ability to lead an independent life because reading glasses and magnifying glass typically cannot provide sufficient magnification for them. In the case of legally blind individuals, access to textual content in an image can be provided by using adaptive technology devices that provide speech or braille output. In order to assist low-vision and blind individuals in performing daily tasks, various devices and systems are known in the art.
Among such devices and systems, desktop video magnifiers generally include a video monitor mounted on a stand having a gooseneck shape. A camera having a large optical zoom is installed on the stand over a working area on which a user disposes an object to be magnified, typically a document with textual content that the user wishes to access. The camera feeds a video processor with a video signal of a portion of the working area, and the video processor in turn feeds this video signal with an increased sharpness and enhanced contrast to the video monitor. Conventional video magnifiers can be provided with OCR capabilities to allow low-vision individuals to access textual information. Once extracted from the image, the machine-encoded text may be displayed to a user as suitably magnified text on a monitor, or be fed to and read aloud by a text-to-speech system, or be presented as Braille content by a Braille display system.
While OCR methods and systems employed in conventional video magnifiers have certain advantages, they also have some drawbacks and limitations. For example, because the cameras employed in such video magnifiers generally have a relatively narrow field of view that covers only a portion of a standard-paper-size document, OCR can only be performed on a corresponding narrow portion of the document. In particular, reading the textual content of an image is made slower, less smooth and less efficient by the fact that OCR cannot be performed on the portions of an image which have yet to be presented to the user but must be performed every time the user brings a new portion of the document within the field of view of the camera.
There is therefore a need in the art for OCR methods and systems that can make the reading of the textual content of an entire image more fluid and convenient, while also alleviating at least some of the drawbacks of the prior art.