Digital images can include various types of content such as landscapes, people, buildings, and other objects. Digital images can be originally captured using various input devices such as cameras, video recorders, etc. Textual object (i.e., alphanumeric characters, symbols, etc.) are often included in a digital image where the textual object can be a part of the original content of the digital image or the textual object can be introduced and provided over the content after the digital image is captured.
Textual objects can include text of varying size, orientation, and/or typeface. Textual objects in a digital image can include information associated with graphical objects in the digital image. For example, when the textual object is a part of the original content of the digital image, the textual object can include street signs, building names, address numbers, etc. When the textual object is introduced after the digital image is captured, the textual object can be identifiers or descriptions associated with the graphical objects included in the digital image or any other textual content.
Character recognition techniques, such as optical character recognition (OCR), have been used to extract text from digital documents to generate searchable text. However, conventional OCR techniques are based on the digital document being a text document having a high resolution (e.g., over 300 dots per inch (dpi)) where the background has uniform contrast. In contrast, digital images generally have non-uniform contrast caused by various factors including image quality, noise, uneven lighting, compression, distortion, number and/or type of graphical objects within the image, etc. In addition, digital images can have a lower resolution (e.g., 50 dpi or lower) than a digital document.
Various imaging processing techniques have been attempted to extract text from a digital image including graphical objects such as edge based methods, texture based methods, and region based text localization methods. However, these techniques can use significant processing resources and can generate inaccurate text strings that require significant user review.
Digital images can also include sensitive information such as proprietary or confidential information. For example, digital images can include sensitive customer data including credit card numbers, social security numbers, account numbers, etc. Techniques to prevent sensitive information from being publicly accessed or disseminated often do not adequately protect and/or classify sensitive textual information within digital images having textual and graphical objects.
Therefore, a need exists for an improved system and method of image processing that extracts text from a digital image including a graphical object. Moreover, a need exists for an improved system and method of data loss prevention that includes a workflow approval process based on text extracted from a digital image including a graphical object.