Printed, typewritten, and handwritten documents have long been used for recording and storing information. Despite current trends towards paperless offices, printed documents continue to be widely used in commercial, institutional, and home environments. With the development of modern computer systems, the creation, storage, retrieval, and transmission of electronic documents has evolved, in parallel with continued use of printed documents, into an extremely efficient and cost-effective alternative information-recording and information-storage medium. Because of overwhelming advantages in efficiency and cost effectiveness enjoyed by modern electronic-document-based information storage and information transactions, printed documents are routinely converted into electronic documents by various methods and systems, including conversion of printed documents into digital scanned-document images using electro-optico-mechanical scanning devices, digital cameras, and other devices and systems followed by automated processing of the scanned-document images to produce electronic documents encoded according to one or more of various different electronic-document-encoding standards. As one example, it is now possible to employ a desktop scanner and sophisticated optical-character-recognition (“OCR”) programs running on a personal computer to convert a printed-paper document into a corresponding electronic document that can be displayed and edited using a word-processing program. Document images are also contained in web pages and various additional sources, and document images obtained from these sources are also converted to electronic documents using OCR methods.
While modern OCR programs have advanced to the point that complex document images that include pictures, frames, line boundaries, and other non-text elements as well as text symbols of any of many common alphabet-based languages can be automatically converted to electronic documents, challenges remain with respect to conversion of document images containing mathematical expressions.