Technical Field
The present disclosure generally relates to computer vision, and in particular relates to optical character recognition.
Description of the Related Art
Optical character recognition (OCR) is the mechanical or electronic conversion of scanned or photographed images of alphanumeric or other characters into machine-encoded/computer-readable alphanumeric or other characters. OCR is used as a form of data entry from some sort of original data source, such as product packaging (e.g., a bag of chips, a box), books, receipts, business cards, mail, or any other object having characters printed or inscribed thereon. OCR is a common method of digitizing printed characters so that the characters can be electronically identified, edited, searched, stored more compactly, displayed on-line, or used in machine processes such as machine translation, text-to-speech, verification, key data extraction and text mining.
Typically, the OCR process can be viewed as a combination of two main sub-processes: (1) localization or segmentation and (2) recognition or classification. The segmentation sub-process locates and “isolates” the individual characters. The recognition or classification sub-process classifies the characters in question and assigns to each character a corresponding alpha-numerical or other character or symbol. The OCR process is typically divided into two sub-processes because the classification sub-process is computationally expensive, and therefore it is advantageous that the classification sub-process not be done throughout an entire image, but rather only at select locations where the segmentation sub-process has detected a potential character. For high quality images, characters are well separated and segmentation sub-process becomes relatively straightforward. However, often images suffer from perspective distortion, curved surfaces, low contrast, high degree of noise, character variation, character skew, variable spacing, background variation, or other non-ideal factors. These factors complicate the segmentation sub-process, and segmentation errors lead to the failure in the recognition sub-process.
Machine-readable symbol readers or scanners may be employed to capture images or representations of characters appearing on various surfaces. One commonly used machine-readable symbol reader is an imager- or imaging-based machine-readable symbol reader. Imaging-based machine-readable symbol readers typically employ flood illumination to simultaneously illuminate the characters, either from dedicated light sources, or in some instances using ambient light.
Machine-readable symbol readers may be fixed, for example, readers may be commonly found at supermarket checkout stands or other point of sale locations. Machine-readable symbol readers may also be handheld (e.g., handheld readers or even smartphones), or mobile (e.g., mounted on a vehicle such as a lift vehicle or a forklift).
Imaging-based machine-readable symbol readers typically include solid-state image circuitry, such as charge-coupled devices (CCDs) or complementary metal-oxide semiconductor (CMOS) devices, and may be implemented using a one-dimensional or two-dimensional imaging array of photosensors (or pixels) to capture an image of the characters or symbols to be recognized. One-dimensional CCD or CMOS readers capture a linear cross-section of the machine-readable symbol, producing an analog waveform whose amplitude represents the relative darkness and lightness of the machine-readable symbol. Two-dimensional CCD or CMOS readers may capture an entire two-dimensional image. The image is then processed to find and decode characters or machine-readable symbols.