Field
Embodiments presented herein generally relate to optical character recognition, and more specifically to performing optical character recognition on a document using video frames.
Description of the Related Art
Data processing is essential for a variety of business and personal transactions. For example, businesses use accounting and inventory data to generate and share reports related to various business metrics, such as sales, invoices, cash flow, or balance sheet information. In another example, individuals use income data from various sources (e.g., employers, passive investments, active investments, retirement plans) to determine tax liabilities (or entitlements to tax refunds) and prepare and file tax returns with the relevant tax authorities.
In many cases, individuals receive paper documents including the data needed to complete a business or personal transaction. For example, individuals may receive a variety of tax documents (e.g., W-2 forms with employment income for an individual, 1099-DIV forms reporting dividend income, 1099-INT forms reporting interest income, K-1 forms reporting partnership income, and so on) as paper documents to input into a computer to determine tax liabilities or eligibility for tax refunds and generate an individual tax return. Businesses may receive invoices from a variety of suppliers and generate invoices for goods or services rendered to customers. The received and generated invoices may be subsequently provided as input to a computer to generate, for example, a cash flow statement for a predetermined time period. In many cases, the documents used in these data processing operations may not have a consistent format. For example, while different W-2 forms generally include the same types of data (e.g., employer identification, taxable income, taxes withheld, and so on), locations of the data on a given form or document may vary across documents received from different sources.
To extract the data out of such documents, a computer may use an optical character recognition (OCR) system to convert an image of a document into machine-encoded text. The OCR system may extract text from the image, for example, on a field-by-field basis for a structured or semi-structured document or on an ad-hoc basis for an unstructured document. If the OCR system is unable to extract text from at least a portion of an image of the document (e.g., due to low image quality, such as low resolution or a blurry image), the OCR system can request that a user provide additional images to use in extracting text from a document.