Optical character recognition (OCR) technology refers to the process whereby electronic equipment (e.g., a scanner or digital camera) examines printed images by detecting patterns of darkness and brightness of the images, determining their shapes, and then using a character recognition technique to translate the shapes into computer recognized characters. At a high level, OCR refers to the process of obtaining text materials and processing the images of the text materials to determine character and layout information. Obtaining text materials may include scanning, photographing, and other such forms of optical input. Various pattern recognition algorithms may be used to analyze the morphological features of text and determine the standard codes of Chinese characters. For example, a text source is scanned by a device and an image file is generated. The image file undergoes image processing, during which the layout of the image file is demarcated (e.g., layout analysis is performed including layout interpreting, character segmentation, normalization, etc.) to separate the image file into different sections for separate, sequential processing. For example, the image file is separated into various portions, where each portion includes a character to be recognized. A character is then recognized from each portion of the image file. For example, the characters may comprise Chinese (“” [“Chinese characters”]) or English. The recognized text may be edited and revised.
The objective of an OCR recognition system is to convert images so that text characters may be recognized from the images. If characters may be recognized from the images, then the images are converted into computer text, thus decreasing the amount of image information that is stored and reducing storage space or enabling re-use, analysis, etc. of recognized text while conserving human effort and input time.
FIG. 1 is a diagram showing an example of a conventional text recognition system. In the example, system 100 includes image conversion and input module 101, image pre-processing module 103, character feature extraction module 105, comparison recognition module 107, and correction and results output module 109.
Image conversion and input module 101 is configured to capture the target subject as an image file. An image file may comprise a black-or-white binary image or a greyscale or color image. The target subject may be captured by an optical device such as a scanner, fax machine, or any photography equipment. The captured image file(s) are input into a computer. As technology advances, scanners, cameras, and other input devices are becoming more finely crafted and more compact, of higher quality, and easier to install. Moreover, the high resolution of such input devices has led to sharper images and higher speeds, which are spurring improvements in OCR processing efficiency.
Image pre-processing module 103 is configured to process the input image files into individual and independent images, where each image comprises a character. Image pre-processing module 103 is also configured to perform image normalization, noise elimination, image correction, and other image processing, as well as file pre-processing entailing image-text analysis and text line and character separation. Image processing is already a mature technology, but the various kinds of image file pre-processing have their advantages and disadvantages. Generally, the picture, form, and text regions of an image are first separated from each other. Even the article's layout direction, the article outline, and its main body of text may be separated. Moreover, character fonts and sizes are determined.
Character feature extraction module 105 is configured to extract features of the image file. Which features are extracted and how they are used directly affect the quality of character recognition. The “features” are used to distinguish among characters to be recognized. For example, two different types of features may be used. The first type of features is statistical features, e.g., the black-white pixel ratio in an image. When a text region is divided into several portions, the combination of the black-white pixel ratios corresponding to the different portions of the image file constitutes a spatial, numerical value vector. The second type of features is structural features. For example, after converting a portion of an image associated with a character into thin lines, the system obtains quantities and locations of the character stroke endpoints and intersection points (e.g., stroke segments) to use as features.
After features have been extracted from the input text, these (statistical and/or structural) features are compared with features stored in a comparison database or a feature database. The database includes feature information associated with characters to be recognized from the image file. The feature information included in the database was determined by the same extraction method that was performed on the image file.
Comparison recognition module 107 is configured to make comparisons between the features extracted from the image file and feature information of characters stored in the comparison database. If a match is found between features extracted for a portion of the image file to feature information stored in the database, then that portion of the image is recognized as the character corresponding to the matching stored set of feature information. In some instances, mathematical operations may be performed on the features during the comparison. For example, mathematical distance may be determined between the features extracted from the image file and a set of feature information corresponding to a character stored in the database. The following are some example types of comparison techniques: Euclidean space comparative technique, relaxation comparative technique, dynamic programming (DP) comparative technique, a database establishment and contrasting technique according to analogue neural networks, and the HMM (Hidden Markov Model) technique. Additionally, human experts may facilitate in the comparison. One or more types of comparison techniques may be used to achieve the optimum result.
Correction and results output module 109 is configured to present the character(s) recognized from the image file at a user interface to a user.
When a product is being purchased, the bar code on the product cannot be recognized by the naked eye but only by specialized reading equipment. In another example, people typically use a digital or text code to index products or other subject matters and to form catalogues, series, codes/numbers, and other such labels. For example, products have product codes (or may be called product numbers, etc.). A product code may include characters such as numbers and text, and images (e.g., the manufacturer's logo). A product code may be used to uniquely identify a certain product. The product code may appear around, near, or on top of common product images that are displayed on Internet or mobile phone advertisements or on marketed physical products. Conventional systems are unable to directly recognize the product codes included in images. A demand has also arisen for capturing and recognizing product codes using mobile platforms such as mobile phones or other mobile devices. Generally, with regard to an image that is based on a scanned or photographed advertisement, it is possible to use conventional OCR technology to recognize some of the characters. However, conventionally, existing image pre-processing module 103 generally uses a layout analysis technique to differentiate image and text sections in the image. This technique applies to text layouts with a certain degree of structural information. However, in images such as product advertisements, where a product code appears near, around, or over one or more graphics, the text and graphics are more difficult to separate using conventional OCR technology and therefore it becomes more difficult to recognize the text from the image.