People with disabilities, such as impaired vision or dyslexia, may have difficulty reading printed material. Automatic systems are needed to either display the documents with higher resolution or to render them as audio recordings.
It is known to provide a mobile print digitizer for the visually impaired. One known device captures printed documents and reads them to the user. A camera or scanner captures an image of a printed page, and then runs optical character recognition (OCR) on the image. A recognized problem with known reading machines is that a noisy image and/or a complex document layout may cause OCR recognition errors. The output is fed to a speech synthesizer such as a text-to-speech system (TTS).
Speech synthesis is the artificial production of human speech. A computer system used for this purpose is commonly called a speech synthesizer, and can be implemented in software or hardware. A text-to-speech system is a type of speech synthesizer that converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech.
Synthesized speech can be created by concatenating pieces of recorded speech that are stored in a database. Systems differ in the size of the stored speech units; a system that stores phones or diphones provides the largest output range, but may lack clarity. For specific usage domains, the storage of entire words or sentences allows for high-quality output. Alternatively, a synthesizer can incorporate a model of the vocal tract and other human voice characteristics to create a completely “synthetic” voice output.
The problem with known devices is that a “noisy” image and complexity of the document layout may cause recognition errors. For instance, a magazine may have several blocks of text, text over photos, articles spanning several pages, etc. Moreover it is possible that multiple users will want to read the same content, and rescan documents that have already been processed.
Presently, the majority of printed material found in kiosks or libraries already exists in a digital form, as both text and high-resolution images. Known publishing processes begin with text, to which a layout is added. A high-resolution of the formatted text is created and then printed on paper. Most publishers have databases including the text, layout, and the high resolution image. Even when the only available version of an article or publication is a paper copy, the associated text, layout, and high resolution image can be generated by the publisher or another party and stored in a database. Text can be rendered as needed, such as being translated into audio using either a TTS system or having someone read it. Retrieving the text or the high resolution image, as needed, would provide a simplified and more accurate method of rendering the desired information.