Printed natural-language documents continue to represent a widely used communications medium among individuals, within organizations, for distribution of information among information consumers, and for various types of information-exchange transactions. With the advent of ubiquitous and powerful computational resources, including personal computational resources embodied in smart phones, pads, tablets, laptops, and personal computers, as well as larger-scale computational resources embodied in cloud-computing facilities, data centers, and higher-end servers within various types of organizations and commercial entities, natural-language information is, with increasing frequency, encoded and exchanged in electronic documents. Printed documents are essentially images, while electronic documents contain sequences of numerical encodings of natural-language symbols and characters. Because electronic documents provide advantages in cost, transmission and distribution efficiencies, ease of editing and modification, and robust-storage over printed documents, an entire industry supporting methods and systems for transforming printed documents into electronic documents has developed over the past 50 years. Computational optical-character-recognition methods and systems and electronic scanners together provide reliable and cost-effective imaging of printed documents and computational processing of the resulting digital images of text-containing documents to generate electronic documents corresponding to the printed documents.
In the past, electronic scanners were large-size desktop, table top, and free-standing electronic appliances. However, with the advent of camera-containing smart phones and other mobile, processor-controlled imaging devices, digital images of text-containing documents can be generated by a large variety of different types of ubiquitous, hand-held devices, including smart phones, inexpensive digital cameras, inexpensive video surveillance cameras, and imaging devices included in mobile computational appliances, including tablets and laptops. Digital images of text-containing documents produced by these hand-held devices and appliances can then be processed, by computational optical-character-recognition systems, including optical-character-recognition applications in smart phones, to produce corresponding electronic documents.
Unfortunately, hand-held document imaging is associated with increased noise, optical blur, non-standard position and orientation of the imaging device with respect to a document being imaged, interfering lighting and contrast effects generated by irregular backgrounds, and other defects and deficiencies in the text-containing digital images produced by the hand-held devices and appliances in comparison with dedicated document-scanning appliances. These defects and deficiencies can seriously degrade the performance of computational optical-character recognition, greatly increasing the frequency of erroneous character recognition and failure of optical-character-recognition methods and systems to produce text encoding of all or large regions of digital text-containing images. Thus, while hand-held document-imaging devices and appliances have great advantages in cost and user accessibility, they are associated with disadvantages and drawbacks that can frustrate and prevent generation of electronic documents from digital text-containing images captured by hand-held devices and appliances. Even text-containing digital images obtained by stationary imaging systems may be associated with defects and deficiencies that can render the results of subsequently applied image-processing methods unsatisfactory. For this reason, designers and developers of imaging devices and appliances and optical-character-recognition methods and systems, as well as users of the devices, appliances, and optical-character-recognition systems, continue to seek methods and systems to ameliorate the defects and deficiencies inherent in many text-containing digital images, including mobile-device-captured digital text-containing digital images, that frustrate subsequent computational image processing of the text-containing digital images.