Data entry from physical forms like order forms and invoices is an essential exercise for digitization of data in business process outsourcing. Digitization of essential data from such semi-structured forms are usually performed manually. Some software tools enable automation of data entry exercise. To avoid this, automation of data entry process in the Business Process Outsourcing (BPO) industry heavily relies on using OCR technology for converting images to text. After text data is generated, text enrichment and enhancement techniques are applied to refine OCR output so that required key fields are correctly detected and recognized. This type of automation platform helps in cost saving by eliminating large human force used in data entry process. Automatically extracted data are manually verified and corrected if necessary. Conventionally, automated extraction is achieved through the use of well-defined templates. Templates are created by technically trained users either using a GUI based tool or programmatically. Each template contains a set of general constructs/rules for recognition of textual data using optical character recognition engine and mapping recognized data to essential fields for extraction.
Creating a useful template can take anywhere between 1-2 hours which can be a significant roadblock where large volume of forms requiring thousands of templates are processed on daily basis. Accordingly, template creation can be very resource intensive.
To avoid this, automation of data entry process in the BPO industry heavily relies on using OCR technology for converting images to text. After text data is generated, text enrichment and enhancement techniques are applied to refine the OCR output so that required key fields are correctly detected and recognized. This type of automation platform helps in cost saving by eliminating large human force used in data entry process.
Success of such systems and methods mainly depends on the accuracy of the OCR process used in the platform. Existing OCR tools provide options to create templates through user a interface to configure them for best capture and recognition of fields in document images. In a majority of document images such as invoice images, there is some structure in-place to specify several keys and values. Template creation tools take advantage of these structure details in terms of their spatial alignment/co-ordinates with respect to invoice image co-ordinates. They manually annotate the key field location and value field location in a reference image and created template is applied on subsequent invoice images to correctly capture those fields. This template creation process is time consuming and requires experience to correctly configure the templates by making use of its capabilities to a full extent. There have been different approaches and techniques proposed to automate the template creation process.