A digital image is formed by an array of rows and columns of pixels. For a grayscale image, each pixel has one value representing the average luminance of the corresponding area. For a color image, each pixel has a red, green and blue value representing the average color of the corresponding area.
A pixel color can also be represented by the YUV (or YCrCb) representation, where Y is the luminance, U and V are the red and blue chrominance channels. It is possible to convert a color from the RGB representation into the YUV representation and vice-versa.
A digital image is characterized by its resolution which is the number of pixels in each direction per inch. An image with a resolution of 300 dpi (dot per inch) is an image which has 300 rows and 300 columns per inch.
A document is a set of pages that contains text but can also contain graphics, pictures, logos, drawings, . . . . A document can be for example a letter, a business card, an invoice, a form, a magazine or newspaper article. Documents are converted into digital images by a device called a scanner. They can also be converted into digital images by a digital camera. Documents are scanned so that they can be kept electronically and further processed by a computer.
The main processing application is text recognition or OCR (Optical Character Recognition) that allows the further processing of the recognized text.
For example, a business card is scanned into a color image. The text is recognized and then interpreted and decomposed into different fields like the name, the firm, the title, the address, etc. This information is kept in a database along with the color image. Users can consult the database and display the business card color images.
The OCR accuracy is of course very important. This accuracy depends on the quality of the printing and the quality of the scanning. The quality of the scanning is especially dependant of the resolution of the scanner. High resolutions will make digital images closer to the original ones with more details kept. Unfortunately, scanning at high resolution takes more time as more pixels must be determined by the scanner and transferred to the computer that will further process the digital image. High resolution scanners are also more costly.
It is estimated that OCR gives a good accuracy for normal text (10 pt and above) at a minimum of 300 dpi. However, many scanners are limited to 200 dpi and the most scanners give their optimal throughput at 200 dpi. For business cards, a minimum resolution of 400 dpi is however preferred for OCR as the text is very often written with a small point size (e.g. 8 pt).
There are well-known techniques for up-scaling a digital image. These use bilinear and bi-cubic interpolations. In those techniques, the grid of the destination image is mapped into the grid of the source image. The destination pixel values are estimated by using the source pixel values of the neighborhood. They are estimated by interpolation. The Bilinear interpolation takes into account the 4 nearest neighbors. The Bi-cubic interpolation takes into account the 16 nearest neighbors. Generally, interpolation has mainly 2 reconstruction errors: blurring and ringing.
Current Business card reading systems operate in two steps: the Optical Character Recognition (OCR) and the identification of the fields (field parsing). They are using OCR engines able to recognize characters from a reduced set of languages, one language at a time. They are using a field parsing module tailored to one country, thereby allowing for field identification rules specific to that country solely. As a result, current business card reading solutions are only able to recognize business cards from a very limited number of countries for instance 6 to 10 countries.