1. Field Of The Invention
The present invention relates to a parser for text that is to be subjected to text-to-speech processing, and particularly to such a parser which detects non-spoken characters and which replaces those characters with spoken text-equivalent characters or which generates speech commands based on the non-spoken characters.
2. Description Of The Related Art
Recently, as the technology for electronically converting text-to-speech has advanced, direct text-to-speech conversion is replacing other techniques where computer generated speech is desired, for example, digitized speech techniques in which spoken words are digitized by high speed analog-to-digital sampling and the digitized words are stored for selective play out. In comparison to direct text-to-speech, however, digital speech techniques consume large amounts of memory because a different storage location is needed for each one of the high speed digital samples. Direct text-to-speech techniques, on the other hand, need only store ASCII text, and consequently those direct text-to-speech techniques need only about b 1/1000 of the memory needed by digitized speech techniques.
Moreover, digitized speech techniques are unsuitable for many applications where computer generated speech is needed. Digitized speech does not work when original audio signals are not available, for example, when it is desired to "speak" an incoming facsimile. Direct text-to-speech, on the other hand, can perform optical character recognition on the incoming facsimile message and channel the text which results from optical character recognition processing to a text-to-speech converter, whereby an incoming facsimile message may be spoken.
Despite the desirability of direct text-to-speech, conventional text-to-speech processes have only limited ability to adapt to the format of an incoming text stream. For example, an incoming text stream often contains characters that are not part of the message being communicated, such as new paragraph marks, printer control characters, and other "non-spoken" characters. Conventional text-to-speech processors attempt to speak each and every character that is fed to it, including the non-spoken characters embedded in the text, which results in garbled speech. Moreover, there has not been provisions to automatically vary the intonation of text-to-speech processors based on the context of the text. This results in monotonous sequences of monotone speech.
Accordingly, it has heretofore not been possible to send arbitrary text files to a text-to-speech converter. Rather, it has been necessary to manually edit text files before text-to-speech processing so as to remove non-spoken characters and so as to insert speech commands (for example, loud, soft, fast or slow) so as to break monotony.