Methods for electronically generating spoken messages are known from, for example, car navigation systems, phone banking systems and flight information systems. These systems are all capable of generating a number of messages having a fixed part combined with variable information.
Consider for example a phone banking system. Such a system supplies to the user a spoken message indicating the balance of his bank account. For example: "Your bank account presents a balance of two thousand three hundred and fifteen dollars." The fixed part in the message of the example is: "Your bank account presents a balance of &lt;NR&gt; dollars.". &lt;NR&gt; indicates the position of an open slot, i.e. a placeholder for information that varies over messages. In this case &lt;NR&gt; has been filled with the numeral 2,315. In general &lt;NR&gt; will be filled with a numerical argument corresponding to the user's bank account. It is clear that this numerical argument will vary from one message to the other.
Such a system operates by concatenating chunks of recorded digitized speech. In the above example, the following chunks could have been recorded and stored:
Your bank account presents a balance of PA0 two thousand PA0 three hundred PA0 and PA0 fifteen PA0 dollars PA0 lexical information of the open slot, PA0 syntactical information of the open slot, PA0 intonation model of the open slot,
At run time, the announcement system could then read these chunks from memory and concatenate them to form a composite waveform representing in digitized form the spoken equivalent of the message. An audible speech signal can then be produced when this composite waveform is processed to a digital-to-analog converter and fed to a loudspeaker.
The drawbacks of the known method are that:
The resulting speech output tends to sound unnatural due to the concatenation of separately recorded speech chunks.
For speech output to sound homogeneous, all speech chunks need to be recorded with the same speaker. This implies that unavailability of the speaker for additional recordings may mean recording the whole set all over with a different speaker.
Since such announcement systems can only playback recorded speech, open slots can only be filled with arguments that have been recorded on beforehand. New recordings are necessary for any new information to be read out.
An object of the present invention is to provide a method for electronically generating a spoken message in such a manner that said message sounds homogeneous and has a highly natural character.
Another object of the invention is to provide a method for electronically generating a spoken message which is not speaker dependent.