Digital images having depicted therein an object inclusive of documents such as a letter, a check, a bill, an invoice, etc. have conventionally been captured and processed using a scanner or multifunction peripheral (MFP) coupled to a computer workstation such as a laptop or desktop computer. Methods and systems capable of performing such capture and processing are well known in the art and well adapted to the tasks for which they are employed.
More recently, the conventional scanner-based and MFP-based image capture and processing applications have shifted toward mobile platforms, e.g. as described in the related patent applications noted above with respect to capturing and processing images using mobile devices (U.S. Pat. No. 8,855,375), classifying objects depicted in images captured using mobile devices (U.S. Pat. No. 9,355,312, e.g. at column 9, line 9-column 15, line 28), and extracting data from images captured using mobile devices (U.S. Patent Publication No. 9,311,531, e.g. at column 18, line 25-column 27, line 16).
While these capture, processing, classification and extraction engines and methods are capable of reliably extracting information from certain objects or images, it is not possible to dynamically extract information from other objects, particularly objects characterized by a relatively complex background, and/or overlapping regions of foreground (e.g. text) and background. In practice, while it may be possible to reliably extract information from a simple document having a plain white background with dark foreground text and/or images imposed thereon, it is not currently possible to reliably extract information from a document with a more complex background, e.g. a document depicting one or more graphics (such as pictures, logos, etc.) as the background with foreground text and/or images imposed thereon, especially if overlapping so as to create a “dual background” with portions lighter than the foreground element and portions darker than the foreground element.
This problem arises primarily because it becomes significantly difficult to distinguish the foreground from the background, especially in view of the fact that digital images are conventionally converted to bitonal (black/white) or grayscale color depth prior to attempting extraction. As a result, tonal differences between background and foreground are suppressed in converting the color channel information into grayscale intensity information or bitonal information.
This is an undesirable limitation that restricts users from using powerful extraction technology on an increasingly diverse array of documents encountered in the modern world and which are useful or necessary to complete various mobile device-mediated transactions or business processes.
For example, it is common for financial documents such as checks, credit cards, etc. to include graphics, photographs, or other imagery and/or color schemes as background upon which important financial information are displayed. The font and color of the foreground financial information may also vary from “standard” business fonts and/or colors, creating additional likelihood that discriminating between the foreground and background will be difficult or impossible.
Similarly, identifying documents such as driver's licenses, passports, employee identification, etc. frequently depict watermarks, holograms, logos, seals, pictures, etc. over which important identifying information may be superimposed in the foreground. To the extent these background and foreground elements overlap, difficulties are introduced into the discrimination process, frustrating or defeating the ability to extract those important elements of information.
Similarly, many documents depict text or other elements of interest according to different “polarizations” or using different “polarity.” While the most common polarity involves representing dark foreground elements (e.g. text, logo, symbol, picture, etc.) on a bright/light background, it is increasingly common to also use an inverse polarity, in which bright/light foreground elements are represented on a dark background, all within the same document. Worse still, some images may contain elements of interest that are significantly similar to the background upon which the elements are superimposed with respect to grayness intensity and/or color.
Conventional extraction techniques rely on suppressing color depth, typically to the point of binary image data, meaning it is not currently possible to reliably extract information represented according to different polarities. The conventional color suppression process essentially maps particular color channel values (or combinations thereof) to corresponding shades of gray.
Conventionally, binarization includes defining a threshold intensity value and assigns one binary pixel value (e.g. 0) to pixels with an intensity below the threshold, and the other binary pixel value (e.g. 1) to pixels with an intensity above the threshold. This results in a black/white bitonal image, and may be accomplished using a single, global binarization threshold applied to the entire image, or in more advanced cases by evaluating a portion of the image, and defining a local threshold configured to take into account the distribution of grayness within the evaluated portion of the image (also known as adaptive thresholding).
In both cases, the mapping of color/grayscale values proceeds according to a single convention that emphasizes information characterized by intensity values on one end of the intensity spectrum (typically darker elements), while suppressing information characterized by intensity values on the other end of the intensity spectrum (typically brighter or more “white” elements). Accordingly, it is not possible to reliably identify and/or extract all the information from the binarized image. Indeed, it is often impossible to retrieve information represented according to at least one of the polarizations (typically light foreground elements on dark background) using conventional approaches.
Therefore, it would be highly beneficial to provide new techniques, systems and/or computer program product technology for identifying regions of a digital image depicting elements of interest, particularly text, especially where such elements of interest are represented according to different polarities/polarizations. It is also desirable to improve recall and accuracy of extracting information from such images.