This disclosure relates generally to the field of image processing and, more particularly, to various techniques for object detection and recognition within digital images using a split processing pipeline operating in both high-resolution and low-resolution modes concurrently.
The advent of portable integrated computing devices has caused a wide-spread proliferation of digital cameras. These integrated computing devices commonly take the form of smartphones or tablets and typically include general purpose computers, cameras, sophisticated user interfaces including touch-sensitive screens, and wireless communications abilities through Wi-Fi, LTE, HSDPA and other cell-based or wireless technologies. The wide proliferation of these integrated devices provides opportunities to use the devices' capabilities to perform tasks that would otherwise require dedicated hardware and software. For example, as noted above, integrated devices such as smartphones and tablets typically have one or two embedded cameras. These cameras comprise lens/camera hardware modules that may be controlled through the general purpose computer using system software and/or downloadable software (e.g., “Apps”) and a user interface including, e.g., programmable buttons placed on the touch-sensitive screen and/or “hands-free” controls such as voice controls.
One opportunity for using the features of an integrated device is to capture and evaluate images. The devices' camera(s) allows the capture of one or more images, and the general purpose computer provides processing power to perform analysis. In addition, any analysis that is performed for a network service computer can be facilitated by transmitting the image data or other data to a service computer (e.g., a server, a website, or other network-accessible computer) using the communications capabilities of the device.
These abilities of integrated devices allow for recreational, commercial and transactional uses of images and image analysis. For example, images may be captured and analyzed to decipher information from the images such as characters, symbols, and/or other objects of interest located in the captured images. The characters, symbols, and/or other objects of interest may be transmitted over a network for any useful purpose such as for use in a game, or a database, or as part of a transaction such as a credit card transaction. For these reasons and others, it is useful to enhance the abilities of these integrated devices and other devices for deciphering information from images.
In particular, when trying to read a credit card with a camera, there are multiple challenges that a user may face. Because of the widely-varying distances that the credit card may be from the camera when the user is attempting to read the credit card, one particular challenge is the difficulty in focusing the camera properly on the credit card. Another challenge faced is associated with the difficulties in reading characters with perspective correction, thus forcing the user to hold the card in a parallel plane to the camera to limit any potential perspective distortions. One of the solutions to these problems available today is that the user has to be guided (e.g., via the user interface on the device possessing the camera) to frame the credit card (or other object-of-interest) in a precise location and orientation—usually very close to the camera—so that sufficient image detail may be obtained. This is challenging and often frustrating to the user—and may even result in a more difficult and time-consuming user experience than simply manually typing in the information of interest from the credit card. It would therefore be desirable to have a system that detects the credit card (or other object-of-interest) in three-dimensional space, utilizing scaling and/or perspective correction on the image, thus allowing the user more freedom in how the credit card (or other object-of-interest) may be held in relation to the camera during the detection process.
Another challenge often faced comes from the computational costs of credit card recognition (or other object-of-interest recognition) algorithms, which scale in complexity as the resolution of the camera increases. Therefore, in prior art implementations, the camera is typically running in a low resolution mode, which necessitates the close framing of the card by the user in order for the camera to read sufficient details on the card for the recognition algorithm to work successfully with sufficient regularity. However, placing the card in such a close focus range also makes it more challenging for the camera's autofocus functionality to handle the situation correctly. A final shortcoming of prior art optical character recognition (OCR) techniques, such as those used in credit card recognition algorithms, is that they rely on single-character classifiers, which require that the incoming character sequence data be segmented before each individual character may be recognized—a requirement that is difficult—if not impossible—in the credit card recognition context.
The inventors have realized new and non-obvious ways to make it easier for the user's device to detect and/or recognize the credit card (or other object-of-interest) by overcoming one or more of the aforementioned challenges. As used herein, the term “detect” in reference to an object-of-interest refers to an algorithm's ability to determine whether the object-of-interest is present in the scene; whereas the term “recognize” in reference to an object-of-interest refers to an algorithm's ability to extract additional information from a detected object-of-interest in order to identify the detected object-of-interest from among the universe of potential objects-of-interest.