In many communication systems, speech synthesis provides information where it is inconvenient or uneconomical to use a visual display. For example, names, addresses or other information from a data processor store may be supplied to an inquiring subscriber via an electroacoustic transducer by converting text stored in a data processor into a speech message. A speech synthesizer for this purpose is adapted to convert a stream of text into a sequence of speech feature signals representing speech elements such as phonemes. The speech feature signal sequence is in turn applied to an electroacoustic transducer from which the desired speech message is obtained. The speech message may accurately reflect the stored text stream. It may not be intelligible, however, unless proper intonation or stress is used. Even where the speech message is intelligible, inappropriate intonation may result in misinterpretation of the spoken message.
As is well known in the art, intonation information is not normally included in printed or computer stored text and must be supplied from other sources. U.S. Pat. No. 4,455,615 issued June 19, 1984 to Tanimoto et al discloses an intonation varying audio output device in an electronic translator wherein words are provided with different stress depending on the position of one or more words in a sentence and the syntax of the sentence. While such word position and syntax supply intonation, they are not particularly useful when the information for a message is obtained from several sources. For example, the paging announcement "Mr. (name), please call your (location) office" contains name and geographical location from one or more sources and directional information from another source. In synthesizing such a speech message, the stress pattern varies with the particular words selected from stored text.
According to another commonly used stress insertion technique, words are converted to phonemes by referring to a stored dictionary containing the required intonation information. It is apparent, however, that a dictionary may not include all the words in the speech message. Alternatively, the intonation can be obtained by spelling arrangements that in effect sound out the text words. Both the dictionary and spelling approaches have disadvantages. Dictionary lookup fails for unknown words and letter-to-sound rules fail for irregular words. A hybrid strategy adopted in most speech synthesizers uses a dictionary when possible and resorts to letter-to-sound rules in the absence of dictionary information. These systems rely primarily on letter-to-sound rules for text words such as surnames which are not generally included in dictionary form. In the absence of either dictionary entries or spelling-to-sound rules, the unknown word may be synthesized as a series of letters. U.S. Pat. No. 4,443,856 issued Apr. 17, 1984 to Hashimoto et al, utilizes this technique of spelling some words or sentences where no verbal information is stored.
With respect to a class of words including proper nouns such as persons or places, it is known that names derived from French take final stress, names from Italian, Japanese and other vowel final languages take main stress on the penultimate syllable (second syllable from the end), and that names from Greek and English take main stress on either the penultimate or antepenultimate (third syllable from the end) syllable, depending on other factors such as morphology and syllable weight. It is an object of the invention to provide an improved text-to-speech synthesis arrangement that are adapted to generate intonation patterns based on text etymology.