The problem of giving the visually impaired access to the vast treasure of the printed word has long plagued mankind. Early attempts were nothing more than one person reading aloud to another. With the advent of the long-playing phonograph record, recordings of certain words became available. This idea has expanded to include cassette tapes and, more recently, the optical disks. The recorded-book concept presupposes that there will be a broad audience for a particular work. Production costs would otherwise be prohibitive. But, another kind of difficulty is posed by technical material, including mathematical equations. A simple algebraic equation such as "a+b=c" presents no problem to even the least experienced reader, but complex expressions containing integrals and summations having limits, etc. create a harder task. While reading matter containing complex expressions avowedly presents problems for the reader, the listener has an even thornier dilemma in trying to create and store mental pictures of what is being read.
A recent approach to the problem of converting the printed word to the spoken word has been to use optical character recognition (OCR) techniques to scan printed matter into a computer. A text-to-voice device, typically a voice synthesizer, then "speaks" the text file to the visually impaired listener. To be for successful, this technique requires that both the scanning and the speaking processes be relatively flawless. In addition, OCR reliability is influenced by the quality of the printed document being scanned. Most scanning programs are not adapted for handling changes of font or character size and style (e.g., bold, italic, etc.); such programs are easily confused by dirt, tears or other distortions of the original document. Scanning a bound volume obviously presents mechanical problems. Hence, an 80-90% accurate scan rate is considered good, and a 90-95% accuracy rate outstanding.
Two types of errors are encountered in OCR scanning: rejects and substitutions. A "reject" is defined as a character that the scanner cannot read at all, while a "substitution" is an incorrectly read character. Either error passed to a speech output device can lead to an unintelligible reading of the word. There are no scanning programs commonly available which can accurately scan a mathematical equation of even moderate complexity.
At the output part of the process, special features are required in order to recognize the mathematical equations so as to be able to "speak" them intelligibly. Self-contained reading machines embodying the aforementioned processes are known in the art, but generally suffer from the limitations hereinabove described.
When a person reads a complex mathematical equation, he or she is provided with several typographical features which aid in the understanding of the equation's meaning. For example, subscripts are generally positioned below the character to which they relate. In addition, a subscript is usually printed in a smaller size of type than the number to which it refers. Similar typographical conventions are generally applied to superscripts, limits upon integrals or summations and to other like operators. Even a reader having a relatively high level of mathematical understanding must use stilted language in order to read aloud the simplest form of mathematical expression. Unfortunately, the listener must construct a mental image of an expression based upon spoken, stilted language.
What is needed to aid this conversion process is a technique for applying equivalents in the auditory domain of those typographical aids available to a reader.
It is therefore an object of the present invention to provide an improved system for auditorially rendering (i.e., speaking) a digitized representation of textual material.
It is another object of the invention to provide a listener with a complex comprehensible audio output of a textual expression.
It is still a further object of the invention to provide an audio formatting language (AFL) with which to manipulate the analogical markers and for controlling an audio output device such as a speech synthesizer.
It is yet a further object of the present invention to provide for a browsing capability in an audio document, so as to allow a listener to easily locate his or her position therein.