Digital images having depicted therein a document such as a letter, a check, a bill, an invoice, etc. have conventionally been captured and processed using a scanner or multifunction peripheral coupled to a computer workstation such as a laptop or desktop computer. Methods and systems capable of performing such capture and processing are well known in the art and well adapted to the tasks for which they are employed.
However, in an era where day-to-day activities, computing, and business are increasingly performed using mobile devices, it would be greatly beneficial to provide analogous document capture and processing systems and methods for deployment and use on mobile platforms, such as smart phones, digital cameras, tablet computers, etc.
A major challenge in transitioning conventional document capture and processing techniques is the limited processing power and image resolution achievable using hardware currently available in mobile devices. These limitations present a significant challenge because it is impossible or impractical to process images captured at resolutions typically much lower than achievable by a conventional scanner. Further, since the capture environment is not “controlled” as in a conventional scanner, images captured using mobile devices tend to include problematic artifacts such as blur, uneven illumination, shadows, distortions, etc. As a result, conventional scanner-based processing algorithms typically perform poorly on digital images captured using a mobile device.
In addition, the limited battery life, processing and memory available on mobile devices makes conventional image processing algorithms employed for scanners prohibitively expensive in terms of computational cost and power consumption. Attempting to process a conventional scanner-based image processing algorithm takes far too much time to be a practical application on modern mobile platforms. As a result, attempts to implement conventional image processing techniques on mobile devices have not met with success, because the mobile devices are incapable of processing the data with sufficiently low latency/processing time to benefit the underlying application of the processing algorithm, and/or the processing consumes prohibitive amounts of battery power to b useful.
For example, network-mediated processing tasks may experience a timeout because the processing time is longer than a maximum window of time permitted for performing a particular operation or network transaction. This is a common limitation to implementing traditional image processing algorithms on mobile platforms, especially in useful financial transaction workflows such as mobile deposit, mobile bill pay, mobile invoicing, mobile loan applications, etc. as well as business process management workflows such as customer onboarding, claims processing, expense report submission, etc.
With specific respect to document processing, it is well known as advantageous in the art of digital document image processing to extract information from the imaged document, e.g. extracting alphanumeric information utilizing an optical character recognition (OCR) technique. Conventionally, the document image may be pre-processed, for instance to improve image quality, reduce color depth (e.g. from color to grayscale or binary), crop the image (e.g. to remove background textures), resize and/or reorient the document as depicted in the captured image, detect the presence of artifacts such as shadows, tears, foreign objects, etc., and/or measure and ensure sufficient illumination to perform downstream processing.
In some approaches, the image may be subjected to a resolution reduction (also known as “downsampling”) to reduce the amount of data the OCR engine must process to generate a character prediction. However, downsampling can be problematic because the OCR engine still must be provided sufficient data to reliably generate accurate character predictions, so there is an inherent limit to the amount of processing advantage that can be obtained from downsampling. Further, downsampling is known to reduce the accuracy of the OCR engine's predictions, so it is not a preferred solution to the problem of exceptionally high computational cost imposed by OCR on mobile platforms.
In other approaches, since mobile devices are advantageously capable of connecting to a network and harnessing other processing resources available to the network (e.g. cloud computing), many existing content recognition technologies will utilize the mobile device to capture the image of the critical document, and transmit this image (perhaps with preprocessing performed on the mobile device prior to transmission) to a network resource having much more available processing power, memory, etc. The processed image (said processing being accomplished, often, in a time order(s) of magnitude less than if the identical processing operation had been performed using the mobile device alone) is then transmitted back to the mobile device for subsequent processing and or use. However, this does require the use of a data plan, and can quickly consume periodic data allocations, causing increased cost to perform the overall processing. In addition, not all mobile devices are necessarily connected to network resources, or capable of connecting to network resources, at all times. Accordingly, approaches that rely on the use of external processing resources are limited by virtue of the very reliance on an active network connection, restricting the temporal and geographic extent of these techniques' utility.
A still further challenge is presented by the nature of mobile capture components (e.g. cameras on mobile phones, tablets, etc.). Where conventional scanners are capable of faithfully representing the physical document in a digital image, critically maintaining aspect ratio, dimensions, and shape of the physical document in the digital image, mobile capture components are frequently incapable of producing such results.
Specifically, images of documents captured by a camera present a new line of processing issues not encountered when dealing with images captured by a scanner. This is in part due to the inherent differences in the way the document image is acquired, as well as the way the devices are constructed. The way that some scanners work is to use a transport mechanism that creates a relative movement between paper and a linear array of sensors. These sensors create pixel values of the document as it moves by, and the sequence of these captured pixel values forms an image. Accordingly, there is generally a horizontal or vertical consistency up to the noise in the sensor itself, and it is the same sensor that provides all the pixels in the line.
In contrast, cameras have many more sensors in a nonlinear array, e.g., typically arranged in a rectangle. Thus, all of these individual sensors are independent, and render image data that is not typically of horizontal or vertical consistency. In addition, cameras introduce a projective effect that is a function of the angle at which the picture is taken. For example, with a linear array like in a scanner, even if the transport of the paper is not perfectly orthogonal to the alignment of sensors and some skew is introduced, there is no projective effect like in a camera. Additionally, with camera capture, nonlinear distortions may be introduced because of the camera optics.
In the context of document image capture and content extraction using, e.g. OCR techniques, this distortion can be particularly problematic because in the image, straight lines of character strings may appear to be not only linearly slanted according to a skew angle, but may be characterized by more complex polynomial functions of second degree or higher order, producing a curve with a decreasing/increasing slope, a curve with changes in slope that reverse direction over the length of the curve, etc. Even many of the preprocessing techniques alluded-to above cannot adequately resolve these distortive effects to allow precise and accurate character prediction via OCR.
In view of the challenges presented above, it would be beneficial to provide an image capture and processing algorithm and applications thereof that compensate for and/or correct problems associated with image capture and processing, particularly content recognition, using a mobile device. It is critical that the solution address these problems while maintaining a low computational cost, even when processing resources are restricted to those hardware and software components physically located on the mobile device, so as to remove the temporal and geographic restrictions inherent to techniques that leverage external (e.g. network-connected) processing resources beyond those physically present on the mobile device itself