An electronic document may be produced by scanning or otherwise acquiring an image of a paper document and performing optical character recognition to produce the text associated with the document. The document may contain not only text. It may also contain tables, images and screenshots which, when compared to images, have some unevenly distributed text. It can be problematic to indentify screenshots during the recognition process. The screenshot may be easily confused with a table or may be erroneously divided into several individual parts (for example, few text blocks comprising the text of the screenshot; an image block comprising a window's header, etc.).
The present invention allows to distinguish screenshots from other types of structures within a document image. As a result, the system is not going to perform the optical character recognition process on a portion of the document image corresponding to the identified screenshot and this portion will be saved as an image.