As part of a document capture and extraction solution, a user scans a hard copy of a page or document to generate a digital copy (or image) and profiles the page or document to recognize and extract the text and its specific zones from that page. Zones may be described as locations (e.g., based on x, y coordinates).
In an enterprise company, for example, that works in an insurance domain, a user may create fields to be mapped to specific zones of a page, so that, in future if a similar insurance document comes in, the document capture and extraction solution automatically identifies the text from the mapped zone and assigns that text to the field.
However, when a specific type of document is scanned and profiled for the first time, the document capture and extraction solution requires an operator to manually create these zones for a specific page/document and map the zones to their specific fields.
Currently the document capture and extraction solution allows users to recognize and extract text and its positions from a digital or hard copy of a page or document using Optical Character Recognition (OCR) and Intelligent Character Recognition (ICR) technologies. However, in order to zone and map the fields, a user has to 1) manually draw a box around the text to record or register that text and its zones and 2) map the zones to a specific field assigned to that document or page.
The current document capture and extraction solution is manual intensive, since a user has to use the user interface feature of drag to create zones accurately. In most cases, the document image shown is not easily visible, and the user has to switch back and forth, zooming the images to get a clear view of text. Most of the time, this tends to be error prone. This also depends on the ability of the user to identify different color gradings efficiently and being able to handle mouse movements and positioning over an image effortlessly. That is, the current document capture and extraction solution requires drawing zones around the text identified in the documents/images, which will draw a red or blue box/zone around the text. As there are more and more zones drawn close to each other, it is difficult for some end users to identify the boundaries of different boxes/zones.