Optical character recognition is the mechanical or electronic conversion of images of typewritten or printed text into machine-encoded text. It is widely used as a form of data entry from printed paper data records, including invoices, bank statement, mail, and other documents. It is a common method of digitizing printed texts so that it can be electronically edited, searched, stored more compactly, displayed or entered online, and used in machine processes such as machine translation, text-to-speech, key data, and text mining, online forms and mobile applications.
Traditionally, OCR systems rely on optical scanners such as flatbed scanners or document feeders, as well as handheld scanners that are adapted to capture a single high-resolution digital image of a document. These optical scanners typically require the document or a sheet of paper to be secured in a flat configuration. Thereafter, the entire sheet of the paper is illuminated uniformly so that its edges can be identified. The edges of the paper must be parallel with the rows or columns of pixels of a screen space so that the paper appears substantially rectangular in shape and fits a standard paper size. Thus, the captured image on the paper may be rotated until the edges of the paper are parallel with the rows or columns of the pixels. The optical scanners can then detect and recognize lines of text and/or other graphical elements on the captured image. Finally, the texts and/or other graphical elements may be assembled into paragraphs to reconstruct a digital version of the document.
However, if the lines of text and/or other graphical elements cannot be recognized, or if the recognition accuracy is not satisfactory, the entire process is restarted so that the image can be recaptured. Generally, the recognition accuracy is affected by various factors such as a shadow casted on the image or the orientation of the paper, among other factors. Existing OCR systems, therefore, are limited in that they require a single focused and sharp high-resolution image to detect and recognize lines of text and/or other graphical elements on the image. This requires an entire sheet of paper or document to be scanned at one time, wherein the paper must be secured in a flat configuration and illuminated in a uniform manner. Additionally, the paper must be rectangular in shape. If the result is not satisfactory, the entire process must be restarted. Therefore, existing OCR systems are not suitable where the foregoing technical constraints cannot be met; and there is a need for an improved OCR system.