Data processing and exchange are essential for a variety of business and personal transactions. For example, small businesses use accounting and inventory data to obtain and share reports regarding inventory sales, customer invoices, or cash flow. Similarly, healthcare providers examine medical records to view patient information related to insurance providers, medical conditions, or office visits.
In addition, data exchange frequently relies on document exchange including electronic versions of documents, such as word-processing documents, spreadsheets, or Portable Document Format (PDF) documents, and paper documents (e.g., which may be generated electronically). For example, a business may manage business transactions with a set of customers by creating a set of bills, invoices, or other types of documents containing data associated with the business transactions and sending the documents to the respective customers. The customers use the data in the documents to pay the bills or invoices, respond to the business, or update their records of the transactions. Similarly, companies, banks, and mortgage companies may provide several tax documents (e.g., W-2, 1099-Int, etc.) to employees and customers as needed to file their tax returns, for example, by using commercially available income tax preparation software.
However, variations in the layouts or designs of documents can disrupt the process of extracting data from the documents. For example, a customer may receive bills, invoices, or other semi-structured documents from a variety of businesses. While the documents may include many of the same types of data, locations of the data on a given document (e.g., a form) often vary across documents from different sources. As a result, a computing device performing optical character recognition on an electronic version of the document may have difficulty extracting information from a given document for use by other applications (e.g., a tax preparation application). Instead, the recipient or document owner may have to manually enter data from the document into an application.