1. Field of the Invention
The present invention relates generally to digital imaging technology, and more specifically it relates to optical character recognition performed by an imaging device which has wireless data transmission capabilities. This optical character recognition operation is done by a remote computational facility, or by dedicated software or hardware resident on the imaging device, or by a combination thereof. The character recognition is based on an image, a set of images, or a video sequence taken of the characters to be recognized. Throughout this patent, “character” is a printed marking or drawing, “characters” refers to “alphanumeric characters”, and “alphanumeric” refers to representations which are alphabetic, or numeric, or graphic (typically with an associated meaning, including, for example, traffic signs in which shape and color convey meaning, or the smiley picture, or the copyright sign, or religious markings such as the Cross, the Crescent, the Start of David, and the like) or symbolic (for example, signs such as +, −, =, $, or the like, which represent some meaning but which are not in themselves alphabetic or numeric, or graphic marks or designs with an associated meaning), or some combination of the alphabetic, numeric, graphic, and symbolic.
2. Description of the Related Art
Technology for automatically recognizing alphanumeric characters from fixed fonts using scanners and high-resolution digital cameras has been in use for years. Such systems, generally called OCR (Optical Character Recognition) systems, are typically comprised of:
1. A high-resolution digital imaging device, such as a flatbed scanner or a digital camera, capable of imaging printed material with sufficient quality.
2. OCR software for converting an image into text.
3. A hardware system on which the OCR software runs, typically a general purpose computer, a microprocessor embedded in a device or on a remote server connected to the device, or a special purpose computer system such as those used in the machine vision industry.
4. Proper illumination equipment or setting, including, for example, the setup of a line scanner, or illumination by special lamps in machine vision settings.
Such OCR systems appear in different settings and are used for different purposes. Several examples may be cited. One example of such a purpose is conversion of page-sized printed documents into text. These systems are typically comprised of a scanner and software running on a desktop computer, and are used to convert single or multi-page documents into text which can then be digitally stored, edited, printed, searched, or processed in other ways.
Another example of such a purpose is the recognition of short printed numeric codes in industrial settings. These systems are typically comprised of a high end industrial digital camera, an illumination system, and software running on a general purpose or proprietary computer system. Such systems may be used to recognize various machine parts, printed circuit boards, or containers. The systems may also be used to extract relevant information about these objects (such as the serial number or type) in order to facilitate processing or inventory keeping. The VisionPro™ optical character verification system made by Cognex™ is one example of such a product.
A third example of such a purpose is recognition of short printed numeric codes in various settings. These systems are typically comprised of a digital camera, a partial illumination system (in which “partial” means that for some parts of the scene illumination is not controlled by this system, such as, for example, in the presence of outdoor lighting may exist in the scene), and software for performing the OCR. A typical application of such systems is License Plate Recognition, which is used in contexts such as parking lots or tolled highways to facilitate vehicle identification. Another typical application is the use of dedicated handheld scanning devices for performing scanning, OCR, and processing (e.g., translation to a different language)—such as the Quicktionary™ OCR Reading pen manufactured by Seiko which is used for the primary purpose of translating from one language to another language.
A fourth example of such a purpose is the translation of various sign images taken by a wireless PDA, where the processing is done by a remote server (such as, for example, the Infoscope™ project by IBM™). In this application, the image is taken with a relatively high quality camera utilizing well-known technology such as a Charge Couple Device (CCD) with variable focus. With proper focusing of the camera, the image may be taken at long range (for a street sign, for example, since the sign is physically much larger than a printed page, allowing greater distance between the object and the imaging device), or at short range (such as for a product label). The OCR processing operation is typically performed by a remote server, and is typically reliant upon standard OCR algorithms. Standard algorithms are sufficient where the obtained imaging resolution for each character is high, similar to the quality of resolution achieved by an optical scanner.
Although OCR is used in a variety of different settings, all of the systems currently in use rely upon some common features. These features would include the following:
First, these systems rely on a priori known geometry and setting of the imaged text. This known geometry affects the design of the imaging system, the illumination system, and the software used. These systems are designed with implicit or explicit assumptions about the physical size of the text, its location in the image, its, orientation, and/or the illumination geometry. For example, OCR software using input from a flatbed scanner assumes that the page is oriented parallel to the scanning direction, and that letters are uniformly illuminated across the page as the scanner provides the illumination. The imaging scale is fixed since the camera/sensor is scanning the page at a very precise fixed distance from the page, and the focus is fixed throughout the image. As another example, in industrial imaging applications, the object to be imaged typically is placed at a fixed position in the imaging field (for example, where a microchip to be inspected is always placed in the middle of the imaging field, resulting in fixed focus and illumination conditions). A third example is that license plate recognition systems capture the license plate at a given distance and horizontal position (due to car structure), and license plates themselves are at a fixed size with small variation. A fourth example is the street sign reading application, which assumes imaging at distances of a couple of feet or more (due to the physical size and location of a street sign), and hence assumes implicitly that images are well focused on a standard fixed-focus camera.
Second, the imaging device is a “dedicated one” (which means that it was chosen, designed, and placed for this particular task), and its primary or only function is to provide the required information for this particular type of OCR.
Third, the resulting resolution of the image of the alphanumeric characters is sufficient for traditional OCR methods of binarization, morphology, and/or template matching, to work. Traditional OCR methods may use any combination of these three types of operations and criteria. These technical terms mean the following:
“Binarization” is the conversion of a gray scale or color image into a binary one. Grey becomes pixels, which are exclusively (0) or (1). Under the current art, grayscale images captured by mobile cameras from short distances are too fuzzy to be processed by binarization. Algorithms and hardware systems that would allow binarization processing for such images or an alternative method would be improvement in the art, and these are one object of the present invention.
“Morphology” is a kind of operation that uses morphological data known about the image to decode that image. Most of the OCR methods in the current art perform part or all of the recognition phase using morphological criteria. For example, consecutive letters are identified as separate entities using the fact that they are not connected by contiguous blocks of black pixels. Another example is that letters can be recognized based on morphological criteria such as the existence of one or more closed loops as part of a letter, and location of loops in relation to the rest of the pixels comprising the letter. For example, the numeral “0” (or the letter O) could be defined by the existence of a closed loop and the absence of any protruding lines from this loop. When the images of characters are small and fuzzy, which happens frequently in current imaging technology, morphological operations cannot be reliably performed. Algorithms and hardware systems that would allow morphology processing or an alternative method for such images, would be improvement in the art, and these are one object of the present invention
“Template Matching” is a process of mathematically comparing a given image piece to a scaled version of an alphanumeric character (such as, for example, the letter “A”) and giving the match a score between 0 and 1, where 1 would mean a perfect fit. These methods are used in some License Plate Recognition (LPR) systems, where the binarization and morphology operations are not useful due to the small number of pixels for the character. However, if the image is blurred, which may be the case is the image has alternate light and shading, or where number of pixels for a character is very small, template matching will also fail, given current algorithms and hardware systems. Conversely, algorithms and hardware systems that would allow template matching in cases of blurred images or few pixels per character, would be an improvement in the art, and these are one object of the present invention.
Fourth, typically the resolution required by current systems is of on the order of 16 or more pixels on the vertical side of the characters. For example, the technical specifications of a modern current product such as the “Camreader”™ by Mediaseek™ indicate a requirement for the imaging resolution to provide at least 16 pixels at the letter height for correct recognition. It should be stressed that the minimum number of pixels require for recognition is not a hard limit. Some OCR systems, in some cases, may recognize characters with pixels below this limit, while other OCR systems, in other cases, will fail to recognize characters even above this limit. Although the point of degradation to failure is not clear in all cases, current art may be characterized such that almost all OCR systems will fail in almost always cases when where the character height of the image is on the order of 10 pixels or less, and almost all OCR systems in almost cases will succeed in recognition where the character height of the image is on the order of 25 pixels or more. Where text is relatively condensed, character heights are relatively short, and OCR systems in general will have great difficulty decoding the images. Alternatively, when the image suffers from fuzziness due to de-focusing (which can occur in, for example, imaging from a small distance using a fixed focus camera) and/or imager movement during imaging, the effective pixel resolution would also decrease below the threshold for successful OCR. Thus, when the smear of a point object is larger than one pixel in the image, the point smear function (PSF) should replace the term pixel in the previous threshold definitions.
Fifth, current OCR technology typically does not, and cannot, take into consideration the typical severe image de-focusing and JPEG compression artifacts which are frequently encountered in a wireless environment. For example, the MediaSeek™ product runs on a cell phone's local CPU (and not on a remote server). Hence, such a product can access the image in its non-transmitted, pre-encoded, and pristine form. Wireless transmission to a remote server (whether or not the image will be retransmitted ultimately to a remote location) creates the vulnerabilities of de-focusing, compression artifacts, and transmission degradation, which are very common in a wireless environment.
Sixth, current OCR technology works badly, or not at all, on what might be called “active displays” showing characters, that is, for example, LED displays, LCD displays, CRTs, plasma displays, and cell phone displays, which are not fixed but which have changing information due to type and nature of the display technology used.
Seventh, even apart from the difficulties already noted above, particularly the difficulties of wireless de-focusing and inability to deal with active display, OCR systems typically cannot deal with the original images generated by the digital cameras attached to wireless devices. Among other problems, digital cameras in most cases suffer from the following difficulties. First, their camera optics are fixed focus, and cannot image properly at distances of less than approximately 20 centimeters. Second, the optical components are often minimal or of low quality, which causes inconsistency of image sharpness, which makes OCR according to current technology very difficult. For example, the resolution of the imaging sensor is typically very low, with resolutions ranging from 1.3 Megapixel at best down to VGA image size (that is, 640 by 480 or roughly 300,000 pixels) in most models. Some models even have CIF resolution sensors (352 by 288, or roughly 100,000 pixels). Even worse, the current existing standard for 3G (Third Generation cellular) video-phones dictates a transmitted imaging resolution of QCIF (176 by 144 pixels). Third, due to the low sensitivity of the sensor and the lack of a flash (or insufficient light emitted by the existing flash), the exposure times required in order to yield a meaningful image in indoor lighting conditions are relatively large. Hence, when an image is taken indoors, the hand movement/shake of the person taking the image typically generates motion smear in the image, further reducing the image's quality and sharpness.