Printed, typewritten, and handwritten documents have long been used for recording and storing information. Despite current trends towards paperless offices, printed documents continue to be widely used in commercial, institutional, and home environments. With the development of modern computer systems, the creation, storage, retrieval, and transmission of electronic documents has evolved, in parallel with continued use of printed documents, into an extremely efficient and cost-effective alternative information-recording and information-storage medium. Because of advantages in efficiency and cost effectiveness enjoyed by modern electronic-document-based information storage and information transactions, printed documents are routinely converted into electronic documents by various methods and systems, including conversion of printed documents into digital scanned-document images using electro-optico-mechanical scanning devices, digital cameras, and other devices and systems followed by automated processing of the scanned-document images to produce electronic documents encoded according to one or more of various different electronic-document-encoding standards. As one example, it is now possible to employ a desktop scanner and sophisticated optical-character-recognition (“OCR”) programs running on a personal computer to convert a printed-paper document into a corresponding electronic document that can be displayed and edited using a word-processing program.
While modern OCR programs have advanced to the point that complex printed documents, which include pictures, frames, line boundaries, and other non-text elements as well as text symbols of any of many common alphabet-based languages, can be automatically converted to electronic documents, challenges remain with respect to conversion of printed documents containing Arabic text and text in other languages in which symbols are joined to together, in continuous fashion, to produce words and portions of words.