Printed, typewritten, and handwritten documents have long been used for recording and storing information. Despite current trends towards paperless offices, printed documents continue to be widely used in commercial, institutional, and home environments. With the development of modern computer systems, the creation, storage, retrieval, and transmission of electronic documents has evolved, in parallel with continued use of printed documents, into an extremely efficient and cost-effective alternative information-recording and information-storage medium. Because of overwhelming advantages in efficiency and cost effectiveness enjoyed by modern electronic-document-based information storage and information transactions, printed documents are routinely converted into electronic documents by various methods and systems, including conversion of printed documents into digital scanned-document images using electro-optico-mechanical scanning devices, digital cameras, and other devices and systems followed by automated processing of the scanned-document images to produce electronic documents encoded according to one or more of various different electronic-document-encoding standards. As one example, it is now possible to employ a desktop scanner and sophisticated optical-character-recognition (“OCR”) control programs that control a personal computer to convert a printed-paper document into a corresponding electronic document that can be displayed and edited using a word-processing program.
While modern OCR systems have advanced to the point that complex printed documents that include pictures, frames, line boundaries, and other non-text elements as well as text symbols of any of many common alphabet-based languages can be automatically converted to electronic documents, challenges remain with respect to conversion of printed documents containing Chinese and Japanese characters or Korean morpho-syllabic blocks.