In the related art, a technology of text-speech synthesis for creating speech signals artificially from a given sentence (text) has been proposed. A speech synthesizing apparatus which realizes the text-speech synthesis as such generally includes three units of a language processing unit, prosody processing unit and a speech synthesizing unit.
The speech synthesizing apparatus is operated as follows.
First of all, morpheme analysis or syntax analysis of an entered text is carried out in the language processing unit to divide the text into the unit, for example, of morpheme, word or accent phrase, and generate a phoneme sequence or a part of speech sequence for each unit.
Subsequently, processing of accent or intonation is carried out in a prosody processing unit to calculate information such as a basic frequency and a phonetic sound duration.
Lastly, in a speech synthesizing unit, characteristic parameters or speech waveforms referred to as speech unit data stored for each unit of synthesis, which is a unit of connection of the speech when generating a synthesized speech in advance (for example, phoneme, syllable, etc.), are connected on the basis of the basic frequency or the phonetic sound duration calculated in the prosody processing unit.
The technology of text-speech synthesis as described above is used for speech message outputs of characters in video games (see JP-A-2001-34282 (Kokai)). In the speech message output by reproduction of the recorded speech in the related art, only pre-recorded terms can be reproduced as a speech. However, with the employment of the text-speech synthesis, production of terms which cannot be recorded in advance, such as names entered by players, as a speech is enabled.
As described above, the text-speech synthesis may be used in speech messages of characters in video games, in particular, of humans or human-type robots.
However, there are characters which are not suitable to speak the same language as the human (for example, Japanese language). For example, in the case of a character such as “Intellectually gifted Alien”, it is not unnatural when it speaks language. However, if it speaks Japanese or other existing language, a problem of lack of authenticity arises.
In this case, it is possible to use meaningless effect sounds instead of speech. However, in this case, it does not sound like a language, and hence a problem of lack of authenticity also arises.