Digital images having depicted therein a document such as a letter, a check, a bill, an invoice, a credit card, a driver license, a passport, a social security card, etc. have conventionally been captured and processed using a scanner or multifunction peripheral coupled to a computer workstation such as a laptop or desktop computer. Methods and systems capable of performing such capture and processing are well known in the art and well adapted to the tasks for which they are employed.
However, in an era where day-to-day activities, computing, and business are increasingly performed using mobile devices, it would be greatly beneficial to provide analogous document capture and processing systems and methods for deployment and use on mobile platforms, such as smart phones, digital cameras, tablet computers, etc.
A major challenge in transitioning conventional document capture and processing techniques is the limited processing power and image resolution achievable using hardware currently available in mobile devices. These limitations present a significant challenge because it is impossible or impractical to process images captured at resolutions typically much lower than achievable by a conventional scanner. As a result, conventional scanner-based processing algorithms typically perform poorly on digital images captured using a mobile device.
In addition, the limited processing and memory available on mobile devices makes conventional image processing algorithms employed for scanners prohibitively expensive in terms of computational cost. Attempting to process a conventional scanner-based image processing algorithm takes far too much time to be a practical application on modern mobile platforms.
A still further challenge is presented by the nature of mobile capture components (e.g. cameras on mobile phones, tablets, etc.). Where conventional scanners are capable of faithfully representing the physical document in a digital image, critically maintaining aspect ratio, dimensions, and shape of the physical document in the digital image, mobile capture components are frequently incapable of producing such results.
Specifically, images of documents captured by a camera present a new line of processing issues not encountered when dealing with images captured by a scanner. This is in part due to the inherent differences in the way the document image is acquired, as well as the way the devices are constructed. The way that some scanners work is to use a transport mechanism that creates a relative movement between paper and a linear array of sensors. These sensors create pixel values of the document as it moves by, and the sequence of these captured pixel values forms an image. Accordingly, there is generally a horizontal or vertical consistency up to the noise in the sensor itself, and it is the same sensor that provides all the pixels in the line.
In contrast, cameras have many more sensors in a nonlinear array, e.g., typically arranged in a rectangle. Thus, all of these individual sensors are independent, and render image data that is not typically of horizontal or vertical consistency. In addition, cameras introduce a projective effect that is a function of the angle at which the picture is taken. For example, with a linear array like in a scanner, even if the transport of the paper is not perfectly orthogonal to the alignment of sensors and some skew is introduced, there is no projective effect like in a camera. Additionally, with camera capture, nonlinear distortions may be introduced because of the camera optics.
Distortions and blur are particularly challenging when attempting to detect objects represented in video data, as the camera typically moves with respect to the object during the capture operation, and video data are typically characterized by a relatively low resolution compared to still images captured using a mobile device. Moreover, the motion of the camera may be erratic and occur within three dimensions, meaning the horizontal and/or vertical consistency associated with linear motion in a conventional scanner is not present in video data captured using mobile devices. Accordingly, reconstructing an object to correct for distortions, e.g. due to changing camera angle and/or position, within a three-dimensional space is a significant challenge.
Further still, as mobile applications increasingly rely on or leverage image data to provide useful services to customers, e.g. mobile banking, shopping, applying for services such as loans, opening accounts, authenticating identity, acquiring or renewing licenses, etc., capturing relevant information within image data is a desirable capability. However, often the detection of objects within the mobile image data is a challenging task, particularly where the object's edges may be missing, obscured, etc. within the captured image/video data. Since conventional detection techniques rely on detecting objects by locating edges of the object (i.e. boundaries between the object, typically referred to as the image “foreground” and the background of the image or video), missing or obscured object edges present an additional obstacle to consistent and accurate object detection.
In view of the challenges presented above, it would be beneficial to provide an image capture and processing algorithm and applications thereof that compensate for and/or correct problems associated with using a mobile device to capture and/or detect objects within image and/or video data, and reconstruct such objects within a three-dimensional coordinate space.