1. Field of the Invention
The present invention relates to a speech synthesizing method, a dictionary organizing method for speech synthesis, a speech synthesis apparatus, and a computer-readable medium recording a speech synthesis program for video games, etc.
2. Description of the Related Art
Recently, there has been a growing need to output a speech message from a machine with the propagation of services in which a speech message (language spoken by men and women) is to be repeatedly supplied as time information on the phone, the speech guidance, etc. of an ATM in a bank, and with a growing demand to improve a man-machine interface of various electric appliances, etc.
In a conventional method of outputting a speech message, a living person speaks predetermined words and sentences, which are stored in a storage device, and the stored data is reproduced and output as needed (hereinafter referred to as a xe2x80x9crecording and reproducing methodxe2x80x9d). In another method of outputting a speech message, that is, a speech synthesizing method, speech data corresponding to various words forming a speech message is stored in a storage device, and the speech data is combined according to an optionally input character string (text).
In the above-mentioned recording and reproducing method, a high-quality speech message can be output. However, any speech message other than the predetermined words or sentences cannot be output. In addition, a storage device is required having a capacity proportional to the number of words and sentences to be output.
On the other hand, in the speech synthesizing method, a speech message corresponding to an optionally input character string, that is, an optional word, can be output, and a necessary storage capacity is smaller than that required in the above mentioned recording and reproducing method. However, there has been a problem that speech massages do not sound natural for some character strings.
In recent video games, with the improvement of performance of a game machine, and with an increasing volume of storage capacity of a storage medium, an increasing number of games are organized to output a speech message from a characters in the games together BGM or effect sound.
At this time, a product having an element of entertainment such as a video game is requested to output speech messages in different voices for respective game characters, and to output a speech message reflecting the emotion or situation at the time when the speech is made. Furthermore, there also is a demand to output the name (utterance) of a player character optionally input/set by a player as the utterance from a game character.
To realize the output of a speech massage based on the above mentioned demands in the recording and reproducing method, it is necessary to store and reproduce the entire speech of words of several thousands or several tens of thousands containing the names of player characters to be input or set by a player. Therefore, the time, cost, and capacity of a storage medium required to store necessary data largely increase. As a result, it is actually impossible to realize the process in the recording and reproducing method.
On the other hand, in the speech synthesizing method, it is relatively easy to utter the name of an optionally input/set player character. However, since the conventional speech synthesizing method only aims at generating a clear and natural speech message, it is quite impossible to synthesize a speech message depending on the personality of a speaker, the emotion and the situation at the time when a speech is made, that is, to output speech messages different in voice quality for each game character, or to output speech messages reflecting the emotion and the situation of a game character.
The present invention aims at providing a speech synthesizing method, a dictionary organizing method for speech synthesis, a speech synthesis apparatus, and a computer-readable medium recording a speech synthesis program which are capable of generating a speech message depending on the personality of a speaker, the emotion, the situation or various contents of a speech, and are applicable to a highly entertaining use such as a video game.
According to the present invention, to attain the above mentioned objects in the speech synthesizing method of generating a speech message using a word dictionary, a prosody dictionary, and a waveform dictionary, a plurality of operation units (hereinafter referred to as tasks) of a speech synthesizing process in which at least one of speakers, the emotion or situation at the time when speeches are made, and the contents of the speeches is different are set, at least prosody dictionaries and waveform dictionaries corresponding to respective tasks are organized, and when a character string whose speech is to be synthesized is input with the task specified, a speech synthesizing process is performed by using the word dictionary, the prosody dictionary, and the waveform dictionary corresponding to the task.
According to the present invention, the speech synthesizing process is performed by dividing the process into tasks such as plural speakers, plural types of emotion or situation at the time when speeches are made, plural contents of the speeches, etc., and by organizing dictionaries for respective tasks. Therefore, a speech message can be easily generated depending on the personality of a speaker, the emotion or situation at the time when a speech is made, and the contents of the speech.
In addition, each of the above mentioned dictionaries for respective tasks is organized by generating a word dictionary corresponding to each task, generating a speech recording scenario by selecting a character string which can be a model from all words in the word dictionary, recording the speech of a speaker based on the speech recording scenario, generating a prosody dictionary and a waveform dictionary from the recorded speech, and performing these operations on each task.
Each of the above mentioned dictionaries for respective tasks is organized by generating a word dictionary and word variation rules corresponding to each task, varying all words contained in the word dictionary corresponding each task according to the word variation rules corresponding each task, generating a speech recording scenario by selecting a character string which can be a model from all varied words in the word dictionary, recording the speech of a speaker based on the speech recording scenario, generating a prosody dictionary and a waveform dictionary from the recorded speech, and performing these operations on each task.
Each of the above mentioned dictionaries for respective tasks is organized by generating word variation rules corresponding to each task, varying all words contained in the word dictionary according to the word variation rules corresponding each task, generating a speech recording scenario by selecting a character string which can be a model from all varied words in the word dictionary, recording the speech of a speaker based on the speech recording scenario, generating a prosody dictionary and a waveform dictionary from the recorded speech, and performing these operations on each task.
According to the present invention, a speech recording scenario can be easily generated corresponding to each task, each dictionary can be organized by recording a speech based on the speech recording scenario, and a speech message containing various contents can be easily generated without increasing the capacity of a dictionary by performing a character string varying process.
Furthermore, a speech synthesizing method using the dictionaries is realized by switching a word dictionary, a prosody dictionary, and a waveform dictionary according to the designation of a task to be input together with a character string to be synthesized, and by synthesizing a speech message corresponding to a character string to be synthesized by using the switched word dictionary, prosody dictionary, and waveform dictionary.
At this time, when each dictionary is a word dictionary containing a number of words, each containing at least one character, together with respective accent types, a prosody dictionary containing a typical prosody model data in the prosody model data indicating the prosody of words contained in the word dictionary, and a waveform dictionary containing recorded speeches as speech data in synthesis units, the speech synthesizing process can be performed by determining the accent type of a character string to be synthesized from the word dictionary, selecting the prosody model data from the prosody dictionary based on the character string to be synthesized and the accent type, selecting waveform data corresponding to each character of the character string to be synthesized from the waveform dictionary based on the selected prosody model data, and connecting selected pieces of waveform data with each other.
Furthermore, another speech synthesizing method using the dictionaries is realized by switching a word dictionary, a prosody dictionary, a waveform dictionary, and word variation rules according to the designation of a task to be input together with a character string to be synthesized, varying the character string to be synthesized based on the word variation rules, and synthesizing a speech message corresponding to the varied character string by using the switched word dictionary, prosody dictionary, and waveform dictionary.
Furthermore, a further speech synthesizing method using the dictionaries is realized by switching a prosody dictionary, a waveform dictionary, and word variation rules according to the designation of a task to be input together with a character string to be synthesized, varying the character string to be synthesized based on the word variation rules, and synthesizing a speech message corresponding to the varied character string by using a word dictionary, and the switched prosody dictionary and waveform dictionary.
At this time, when each dictionary is a word dictionary containing a number of words, each containing at least one character, together with respective accent types, a prosody dictionary containing a typical prosody model data in the prosody model data indicating the prosody of words contained in the word dictionary, a waveform dictionary containing recorded speeches as speech data in synthesis units, and the word variation rules recording the variation rules of character strings, the speech synthesizing process can be performed by determining the accent type of a character string to be synthesized from the word dictionary or the word variation rules, selecting the prosody model data from the prosody dictionary based on the character string to be synthesized and the accent type, selecting waveform data corresponding to each character of the character string to be synthesized from the waveform dictionary based on the selected prosody model data, and connecting selected pieces of waveform data with each other.
A speech synthesis apparatus using the dictionaries comprises means for switching a word dictionary, a prosody dictionary, and a waveform dictionary according to the designation of a task input together with a character string to be synthesized, and means for synthesizing a speech message corresponding to the character string to be synthesized using the switched word dictionary, prosody dictionary, and waveform dictionary.
Another speech synthesis apparatus using the dictionaries comprises means for switching a word dictionary, a prosody dictionary, a waveform dictionary, and word variation rules according to the designation of a task input together with a character string to be synthesized, means for varying the character string to be synthesized according to the word variation rules, and means for synthesizing a speech message corresponding to the varied character string using the switched word dictionary, prosody dictionary, and waveform dictionary.
A further speech synthesis apparatus using the dictionaries comprises means for switching a prosody dictionary, a waveform dictionary, and word variation rules according to the designation of a task input together with a character string to be synthesized, means for varying the character string to be synthesized according to the word variation rules, and means for synthesizing a speech message corresponding to the varied character string using a word dictionary, and the switched prosody dictionary and waveform dictionary.
The above mentioned speech synthesis apparatus can be realized by a computer-readable storage medium storing a speech synthesis program used to direct a computer to perform the functions of a word dictionary, a prosody dictionary, and a waveform dictionary corresponding to each of the plurality of tasks of a speech synthesizing process in which at least one of speakers, emotion or situation at the time when speeches are made, and the contents of the speeches is different, means for switching the word dictionary, the prosody dictionary, and the waveform dictionary according to the designation of a task input together with a character string to be synthesized, and means for synthesizing a speech message corresponding to the character string to be synthesized using the switched word dictionary, prosody dictionary, and waveform dictionary.
The above mentioned speech synthesis apparatus can be realized by a computer-readable storage medium storing a speech synthesis program used to direct a computer to perform the functions of a word dictionary, a prosody dictionary, a waveform dictionary, and word variation rules corresponding to each of the plurality of tasks of a speech synthesizing process in which at least one of speakers, emotion or situation at the time when speeches are made, and the contents of the speeches is different, means for switching the word dictionary, the prosody dictionary, the waveform dictionary, and the word variation rules according to the designation of a task input together with a character string to be synthesized, means for varying the character string to be synthesized according to the word variation rules, and means for synthesizing a speech message corresponding to the varied character string using the switched word dictionary, prosody dictionary, and waveform dictionary.
The above mentioned speech synthesis apparatus can be realized by a computer-readable storage medium storing a speech synthesis program used to direct a computer to perform the function of a word dictionary and the function of prosody dictionaries, waveform dictionaries, and word variation rules corresponding to each of the plurality of tasks of a speech synthesizing process in which any of speakers, emotion at the time when speeches are made, and situation at the time when speeches are made are different from each other, means for switching the prosody dictionary, the waveform dictionary, and the word variation rules according to the designation of a task input together with a character string to be synthesized, means for varying the character string to be synthesized according to the word variation rules, and means for synthesizing a speech message corresponding to the varied character string using the word dictionary, the switched prosody dictionary and waveform dictionary.
The above mentioned objects, other objects, features, and merits of the present invention will be clearly described below by referring to the attached drawings.