Digital images having depicted therein an object inclusive of documents such as a letter, a check, a bill, an invoice, etc. have conventionally been captured and processed using a scanner or multifunction peripheral (MFP) coupled to a computer workstation such as a laptop or desktop computer. Methods and systems capable of performing such capture and processing are well known in the art and well adapted to the tasks for which they are employed.
More recently, the conventional scanner-based and MFP-based image capture and processing applications have shifted toward mobile platforms, e.g. as described in the related patent applications noted above with respect to capturing and processing images using mobile devices (U.S. Pat. No. 8,855,375), classifying objects depicted in images captured using mobile devices (U.S. Pat. No. 9,355,312, e.g. at column 9, line 9-column 15, line 28), extracting data from images captured using mobile devices (U.S. Pat. No. 9,311,531, e.g. at column 18, line 25-column 27, line 16), and even generating an electronic form based on knowledge obtained from analyzing an image of a document in light of a learn-by-example knowledge base (U.S. Pat. No. 9,275,281, e.g. at column 25, lines 56-61).
While these capture, processing, classification and extraction engines and methods are capable of reliably extracting information from certain objects or images, and generating electronic forms therefrom, these techniques rely on a plethora of training examples from which to generate sufficient a priori knowledge regarding different object types, types of information represented therein, and location of such information relative to the object.
The learn-by-example training process, and more importantly the exemplars of the training set, are necessary to enable robust object classification and data extraction despite the inherent variations in appearance of even the same exact object across different images. Skilled artisans will appreciate that factors such as capture angle, motion of the capture device during image capture, capture resolution, illumination conditions, capture distance, etc. all contribute to variations in the appearance of an object. In order to accommodate these variations, a learn-by-example training set representing all such variations within tolerable limits is generally employed, and subsequent test images are classified and data extracted therefrom (including determination of fields) using the trained classification/extraction model.
In practice, the above training-based automated approach frequently fails to identify all desired information, e.g. due to variations in the image extending beyond tolerable limits, such as a corner of the image being cast under a shadow and frustrating the identification of fields in the shadowed region, or distortions too severe to detect and bound a particular field for data extraction or optical character recognition. Similarly, even when fields are properly located, the type of data expected or suitable for entry in such fields may be difficult or impossible to discern, e.g. where text is missing or depicted according to an unexpected format.
To address these shortcomings, conventional solutions typically employ a human curator to review and correct the field determination and data type identification processes. For example, a classification and/or extraction result obtained by processing a particular image using learn-by-example classification and/or extraction models may be output and passed to a human user for validation of the identified field locations, field types, data types, etc. The human may provide input indicating a location of a field, field label, and data type associated with the field. These input information may be associated with the image as metadata, and the electronic form generation process may proceed with the added information provided by the human user.
However, this solution is both imperfect (inherently, as will all human-driven processes) and costly—both in terms of overall processing time, and economic cost of employing human curators to review a potentially vast volume of processing results.
And while it is possible to derive the necessary information, e.g. field location, field label, data type, etc. from a standardized form such as an electronic form with great accuracy and recall, such information would not be useful in the context of deriving similar information from other images of the same type of object (e.g. a physical representation of the electronic form) because such information is rigidly applicable to only the standardized representation of the form—variations arising from capture angle, illumination, etc. are not accounted for in the standardized representation and severely limit the scope to which the standardized representation is applicable for subsequent analysis of images.
Therefore, it would be highly beneficial to provide new techniques, systems and/or computer program product technology configured to process an electronic form and utilize information derived from such electronic form to build classification and/or extraction models suitable for classifying other similar forms, and extracting information therefrom in an efficient and reliable manner that is robust to variations between images of the same type of form.