Optical character recognition (OCR) hand-held scanners are known. They convert the image of a printed text, barcode or picture into a machine-readable code. Often, the image acquired by the hand-held scanning device is transferred to the PC/Mac which then performs the following steps: process the image to improve the quality, OCR the text, and export the recognized text to an application. An example of an OCR hand-held scanner known in the art is a pen scanner. A pen scanning device is a scanning device shaped like a pen usually connected to a computer. The pen scanning device is operated by hand and allows to enter a line of text into a computer application by sliding the pen on the document.
The OCR hand-held scanners comprise a one dimensional optical sensor for acquiring image information which is managed by a processing unit and stored in a memory. For a hand-held scanner, the hand-held scanner is passed over a printed text by the user such that there is relative movement between the optical sensor and the printed text to be acquired. During such relative movement, a series of images are acquired in which each acquired image corresponds to a small portion of the printed text to be scanned. When the scanned image is to be reconstructed, a distorted image results from the combined acquired images.
Since a one dimensional sensor is used, the problem occurred how to calculate the instantaneous scanning speed which is needed to rebuild the two dimensional image. Solutions to correct this distortion in the scanned image are known in the art.
Some solutions have been based on mechanical structures such as small wheels being in contact with the paper and allowing to calculate the speed. U.S. Pat. No. 5,083,218 discloses a hand-held image reading apparatus in which a rubber roller moves over the surface of the image to determine the relative movement between the hand-held scanning device and the image being scanned or acquired.
In another solution disclosed in U.S. Pat. No. 5,023,922, a two-dimensional optical sensor is used for calculating the speed of the relative movement based on the time interval required for an image to transverse the sensor.
In still another solution, U.S. Pat. No. 6,965,703 discloses to correct the distortion caused by the variability of the instantaneous scanning speed by applying a compensation. This solution utilizes the character height and font ratio for each font in the text to obtain a local correction factor at each location in the text image. The local correction factor is subsequently used to correct the distorted image. Although, the above solutions provide, in many cases, more than reasonable results, the resulting OCR accuracy is, in a number of situations, still too low especially because the hand-held device is operated by hand. Since the hand-held scanning device is operated by hand, the user himself introduces various kinds of distortions in the scanned images which are not caused by changing the speed of the hand-held scanning device.
Further, paper documents are often scanned to have their information extracted and transferred to a data management software, wherein the relevant part of this information can be automatically handled. A known method to do that is to scan the full paper document, extract the information from the full paper document, then select the relevant information and transfer it to the data management software. This method is inefficient because the full document has to be scanned and information has to be extracted from the full document. Moreover, the selection of the relevant information can be difficult because the relevant information can be difficult to locate in the information about the full document.
OCR applications can be performed on mobile devices, but the mobile devices have typically not enough computing power to perform fast and accurate OCR. Moreover, mobile devices have typically not enough memory to perform OCR in many languages.