Printed natural-language documents continue to represent a widely used communications medium among individuals, within organizations, and for distribution of information among information consumers. With the advent of ubiquitous and powerful computational resources, including personal computational resources embodied in smart phones, pads, tablets, laptops, and personal computers, as well as larger-scale computational resources embodied in cloud-computing facilities, data centers, and higher-end servers within various types of organizations and commercial entities, natural-language information is, with increasing frequency, encoded and exchanged in electronic documents. Printed documents are essentially images, while electronic documents contain sequences of numerical encodings of natural-language symbols and characters. Because electronic documents provide advantages in cost, transmission and distribution efficiencies, ease of editing and modification, and robust-storage over printed documents, an entire industry supporting methods and systems for transforming printed documents into electronic documents has developed over the past 50 years. Computational optical-character-recognition methods and systems and electronic scanners together provide reliable and cost-effective imaging of printed documents and computational processing of the resulting digital images of text-containing documents to generate electronic documents corresponding to the printed documents.
In the past, electronic scanners were large-size desktop, table top, and free-standing electronic appliances. However, with the advent of camera-containing smart phones and other mobile, processor-controlled imaging devices, digital images of text-containing documents can be generated by a large variety of different types of ubiquitous, hand-held devices, including smart phones, inexpensive digital cameras, inexpensive video surveillance cameras, and imaging devices included in mobile computational appliances, including tablets and laptops. Digital images of text-containing documents produced by these hand-held devices and appliances can then be processed, by computational optical-character-recognition systems, including optical-character-recognition applications in smart phones, to produce corresponding electronic documents.
Unfortunately, text-containing images produced by hand-held document imaging are often distorted by noise, optical blur, curved-page-surface-induced and perspective-induced curvature of linear text lines, and other defects and deficiencies. Even images generated by dedicated document-scanning appliances may suffer from perspective-induced curvature of linear text lines when a book is imaged by opening the book and placing it face down on a transparent scanning surface. These defects and deficiencies can seriously degrade the performance of computational optical-character recognition, greatly increasing the frequency of erroneous character recognition and failure of optical-character-recognition methods and systems to produce accurate text encoding for text contained in digital images. For this reason, designers and developers of imaging devices, imaging appliances, and optical-character-recognition methods and systems, as well as users of the devices, appliances, and optical-character-recognition systems, continue to seek methods and systems to ameliorate the defects and deficiencies inherent in many text-containing digital images, including mobile-device-captured digital text-containing digital images, that frustrate subsequent computational image processing of the text-containing digital images.