1. Field of the Invention
The present invention relates to a text-to-speech conversion system, and in particular, to a Japanese-text to speech conversion system for converting a text in Japanese into a synthesized speech.
2. Description of the Related Art
A Japanese-text to speech conversion system is a system wherein a sentence in both kanji (Chinese character) and kana (Japanese alphabet), which Japanese native speakers routinely write and read, is inputted as an input text, the input text is converted into voices, and the voices as converted are outputted as a synthesized speech. FIG. 1 shows a block diagram of a conventional system by way of example. The conventional system is provided with a conversion processing unit 12 for converting a Japanese text inputted through an input unit 10 into a synthesized speech. The Japanese text is inputted to a text analyzer 14 of the conversion processing unit 12. In the text analyzer 14, a phoneme rhythm symbol string is generated from a sentence in both kanji and kana as inputted. The phoneme rhythm symbol string represents description (intermediate language) of reading, accent, intonation, etc. of the sentence inputted, expressed in the form of a character string. Reading and accent of respective words are previously registered in a phonation dictionary 16, and the phoneme rhythm symbol string is generated by referring to the phonation dictionary 16. When, for example, a text reading “ (a cat mewed)” is inputted, the text analyzer 14 divides the input text into respective words by use of the longest string-matching method as is well known, that is, by use of the longest word with a notation matching the input text while referring to the phonation dictionary 16. In this case, the input text is converted into a word string consisting of ┌ (ne' ko)┘, ┌ (ga)┘, ┌-(nya'-)┘, ┌ (to)┘, ┌(nai)┘, and ┌(ta)┘. What is shown in respective round brackets is information on respective words, registered in the dictionary, that is, reading and accent of the respective words.
The text analyzer 14 generates a phoneme rhythm symbol string representing ┌ne' ko ga, nya' -to, naita┘ by use of the information on respective words of the word string, registered in the dictionary, that is, the information in the respective round brackets, and on the basis of such information, speech synthesis is executed by a rule-based speech synthesizer 18. In the phoneme rhythm symbol string, ┌'┘ indicates an accent position, and ┌,┘ indicates a punctuation of respective accented phrases.
The rule-based speech synthesizer 18 generates synthesized waveforms on the basis of the phoneme rhythm symbol string by referring to a memory 20 wherein speech element data are stored. The synthesized waveforms are converted into a synthesized speech via a speaker 22, and outputted. The speech element data are basic units of speech, for forming a synthesized waveform by joining themselves together, and various types of speech element data according to types of sound are stored in the memory 20 such as a ROM, and so forth.
With the Japanese-text to speech conversion system of the conventional type, using such a method of speech synthesis as described above, any text in Japanese can be read in the form of a synthesized speech, however, a problem has been encountered that the synthesized speech as outputted is poor in intonation, thereby giving a listener feeling of monotonousness with the result that the listener gets bored or tired of listening to the same.