Currently there are no efficient solutions for extracting text from graphical user interface (GUI) content rendered on a display screen of a computing system. The related solutions are either not accurate enough for typical optical character recognition (OCR) schemes or produce ambiguous results. Without going into much detail, the disambiguation of the produced results can be prohibitively resource intensive, particularly for real time applications.
OCR methods are currently available that can be applied to a gray scale image and reduce the image to a binary format by calculating the maximally separating threshold between two assumed modals in a gray scale histogram of the image. If such methods are applied to GUI content displayed on a screen, such as that shown in GUI environment 110 of FIG. 1, an output such as that shown in FIG. 2A may be produced, using once a global threshold and once an adaptive threshold.
As shown in FIG. 2A, text regions in the example illustration are indeed separated, but artifacts introduced by the OCR algorithm are also visible in the background. Using a global threshold can cause parts of gradients in the background to merge with the text foreground. Referring to FIG. 2B, when the adaptive threshold is used, the text is segmented, but again due to the gradient, the resulting image is cluttered with foreground objects. Methods than can produce better results are desirable.