Many documents are stored and archived in a hardcopy format (e.g., paper, microfiche, microfilm, etc.). Although it is easy to create an image of a hardcopy document by scanning the hardcopy document (HD), the text in the image does not allow for easily editing.
Many different character recognition (CR) algorithms (e.g., optical CR algorithms) exist that can generate an editable electronic version of the HD from the image of the HD. However, it is challenging for these algorithms to correctly identify different layout objects (e.g., main body, header, footer, linked textboxes, etc.) within the image of the HD. Accordingly, the markup of these editable electronic versions tend to include incorrect layout objects, that are then incorrectly interpreted by a word processing application. This reduces the editing functionality of the word processing application. Regardless, users are still interested in generating electronic versions of HDs for editing, modifying, and/or archiving.