Data processing and exchange are essential for a variety of businesses and personal transactions. For example, small businesses use accounting and inventory data to obtain and share reports regarding inventory sales, customer invoices, or cash flow. Similarly, healthcare providers examine medical records to view patient information related to insurance providers, medical conditions, or office visits.
In addition, data exchange frequently relies on document exchange including electronic versions of documents such as word-processing documents, spreadsheets, or Portable Document Format (PDF) documents and paper documents (e.g., which may be generated electronically). For example, a business may manage business transactions with a set of customers by creating a set of bills, invoices, or other types of documents containing data associated with the business transactions and sending the documents to the respective customers. The customers use the data in the documents to pay the bills or invoices, respond to the business, or update their records of the transactions. Similarly, companies, banks and mortgage companies may provide several tax documents (e.g., W-2, 1099-Int, etc.) to employees and customers as needed to file their tax returns, for example, by using commercially available income tax preparation software.
Optical character recognition (OCR) systems are generally used to detect text present in an image of a document (e.g., a tax document) and to convert the detected text into a machine readable representation. Digital camera and mobile document image acquisition are becoming increasingly popular in the world of optical character recognition and text recognition. In order to accurately recognize text with a conventional OCR engine, the image typically needs to be of a high quality. However, images produced, for example, using digital cameras and other mobile devices may include many distortions and may produce images of poor quality. The quality of an image depends on various factors including quality of camera used to produce the image, power of the lens, resolution, light intensity, relative motion between the camera and the text document, level of focus, background including back lighting, and the like, in addition to quality of the text document. Thus, an image produced using such a device may include various forms of distortion including blur, skew, rotation, shadow marks and other forms of distortion.
As a result, a computing device performing optical character recognition on an image of the document may have difficulty extracting information from a given document for use by other applications (e.g., tax preparation application). Instead, the recipient or document owner may have to manually enter data from the document into an application.