The exemplary embodiment relates to the information arts. It relates especially to a method and an apparatus for populating a form with textual information extracted from a physical document, and will be described with particular reference thereto.
Many business-related forms are now in electronic format. In some cases, the information for filling in these forms is present in a printed document. For example, an electronic form for inputting information relating to a book may include fields such as author, title, ISBN number, publisher, and dates. Forms also exist for inputting information from business cards, IDs, correspondence, and other physical documents so that the information is available in electronic format.
Optical character recognition (OCR) techniques employ software which extracts textual information from scanned images. Such techniques have been applied to extract textual information from books, business cards, and the like. Once text is extracted, each text line can be tagged as to data type. The extracted information can be used to pre-fill corresponding fields in an electronic form. Other information may be input manually. For example, when capturing a business card or an ID, personal data may be extracted by tagging text lines as “personal name,” “job title,” “entity affiliation,” and so forth. The tagged personal data can be used to populate a form, such as a new contact form, which can then be incorporated into a contacts database. Corporate mail rooms may also use such techniques to provide a database by completing forms with information extracted from incoming mail.
OCR techniques used for form population invariably result in some errors, both in the recognition of the individual characters in the digital document and in the correct association of the extracted information with specific fields of the form (tagging). However, manual input of information is time consuming and also generally incurs errors.
It would be desirable to provide a method for populating an electronic form which aims to minimize the time required for completion of the form, while at the same time, considering the potential for errors in the process.