The present invention involves a method and system for the detection and recognition of text in images. The quality of these images varies due to improper focus, motion blur, lighting variations or noise.
Printed text can be detected and recognized by using an optical character-recognition engine (OCR). The OCR technology that is currently used runs on images of clear text with modern fonts printed against clean backgrounds. In addition, the images are assumed to be created by a scanner at high resolution. The scanning process produces high quality sharp images of text under uniform illumination. This is also true when high resolution cameras are used for scanning under uniform illumination. However, commercial OCR and conventional OCR technologies do not work well when the fonts are unusual or when the text is printed against a non-uniform image background. Also, commercial OCR technology does not work well when the images are taken with hand-held cameras whose viewpoint is no longer fronto-parallel to the text and lighting changes or ambient illumination may affect the results. In a fronto-parallel view, a rectangle is imaged as a rectangle and the world and image rectangle have the same aspect ratio.
Images produced by cameras on mobile computational devices such as cell phones and personal digital assistants (PDAs) are often of poor quality because they have inexpensive optics, small apertures, slow shutters and in many cases fixed-focus lenses. Such cameras often show blur (both motion and focus blur) and noise.
Moreover, in situations where the lenses are close to the object, the depth of field is poor and the blur problem grows worse with the introduction of varying amounts of blur through the images produced by different lenses.
Illumination variations are an additional problem and cannot be rectified easily using the flash on cell phone cameras, since the flash on these devices is usually not strong enough and tends to create illumination variations. The OCR technology used currently often works poorly on the text in such images.
Some efforts have been made to detect text against the general image in the background and then extract and clean the text to create black text against a white background, which is then passed on to an OCR for text-detection. Examples of such efforts can be seen in Wu et al (V. Wu, R. Manmatha, and E. M. Riseman “TextFinder: An Automatic System to Detect and Recognize Text In Images,” IEEE PAMI, vol. 21, no. 11, pp. 1224-1229, Nov. 1999) and more recently in Chen and Yuille (X. Chen and A. Yuille, “Detecting and Reading Text in Natural Scenes, CVPR 2004, vol. 2, pp. 366-373). The effort by Wu et al was designed mainly for scanned images, while the more recent work of Chen and Yuille was designed for street signs taken by high quality cameras. Neither of these is designed to rectify images of poor quality with problems like blur.
There is, therefore, a need for a method and system for detecting and extracting text in the images of varying quality produced by mobile computational devices such as cell phones and PDAs.