1. Field of the Invention
Various embodiments disclosed herein relate generally to the field of image processing. More specifically, various embodiments disclosed herein are directed to systems and methods for processing an image of a form so that it can be properly compared with a template image for extraction of relevant content.
2. Related Art
Certain software applications enable a user to electronically transmit an image of a document to a processing party rather than delivering the actual document itself. For example, a user can place a check inside a flatbed scanner in order to create a scanned image of the check, and then subsequently upload this scanned image to a remote server associated with his bank. Once received by the remote server, the uploaded image can then be algorithmically analyzed in order to identify specific content that has been written or printed on the check (e.g., the payor, the payee, the date, the bank to draw funds from, the total amount to be paid, etc.). In this manner, a user can essentially deposit a check by uploading an image of it to the bank, rather than having to travel to a bank teller or to an ATM for physical delivery.
Note that the above example describes only one exemplary document image processing application (DIPA), and this application relates specifically to checks. However, a range of other services exist for processing other types of documents as well.
Due to the variety of services offered, when an electronic image is first received, it is often necessary to determine what type of document it is before attempting to engage in subsequent processing operations. Obviously, different types of documents exhibit different characteristics. For example, a check is typically going to have a much different presentation than a store rebate, in terms of its size, spacing, content, textual arrangement, and possibly also its orientation. Similarly, a store rebate will have different characteristics than a money order.
Note that even documents of the same type can have different formats. Consider, for example, if an application could be designed to correctly process images of W-2 tax forms. There are dozens of different types of W-2 forms presently in existence, and these forms can have differences ranging from slight variations (e.g., having essentially the same form layout, but printed using different software; compare, for example, the W-2 form of FIG. 1 with the W-2 form of FIG. 2) to W-2 forms that have different layouts entirely (compare, for example, the W-2 form of FIG. 3 with those depicted in FIGS. 1 and 2). In order to extract content from the document correctly, it is often first necessary to match the received image with an appropriate format/layout. Each possible format/layout is typically represented as a separate template stored in a database that is accessible by the server.
For example, a first template may be used to represent a first type of business check from a specific bank, while a second template can be used to represent a second type of business check from the same bank. In some cases, if enough similarity exists between two types of forms (e.g., as in the W2 forms depicted in FIGS. 1 and 2) a single template can be used to represent both forms.
Conventional technologies that are responsible for reading an image and matching it to a corresponding template, however, rely on a number of assumptions. The first assumption is that the size of the image received will be a fixed or predetermined size. The second assumption is that there will be a low level of distortion in the received image. Such assumptions, however, tend to only hold if the document is scanned inside a flatbed scanner.
When dealing with images recorded by a digital camera or other mobile device (for example, in a snapshot of a document taken by a smartphone), these assumptions can no longer be relied on. In these cases, the image acquired by the camera is often angularly distorted due to the fact that the photographer has targeted the document slightly off-axis (i.e., not perfectly aligned in an overhead position). Also, since the same document can be photographed at different distances, it can no longer be assumed that a document of a specific type will have a fixed or predetermined size. For these reasons, conventional technologies used to process electronic images do not tend to operate effectively (or even work at all) with images captured by mobile devices. Even in those applications that have the capability of processing mobile images, it is often the case that a separate “training” process must be employed in order to enable the application to have improved success with recognizing certain types of input images.