1. Field of the Invention
This invention relates to a speech synthesis method, a speech synthesis apparatus, a program, and a recording medium for synthesizing the sentence or the singing by a natural speech or voice close to the human voice, and a robot apparatus outputting the speech.
This application claims priority of Japanese Patent Application No.2002-073385, filed on Mar. 15, 2002, the entirety of which is incorporated by reference herein.
2. Description of Related Art
A mechanical apparatus for performing movements simulating the movement of the human being (or animate beings), using electrical or magnetic operation, is termed a “robot”. The robots started to be used widely in this country towards the end of 1960s. Most of the robots used were industrial robots, such as manipulators or transporting robots, aimed at automation or unmanned operations in plants.
Recently, development in practically useful robots, supporting the human life as a partner, that is supporting the human activities in various aspects of our everyday life, such as in living environment, is progressing. In distinction from the industrial robots, these practically useful robots are endowed with the ability to learn for themselves the method for adaptation to human being with variable personalities, or to variable environments, in the variegated aspects of our everyday life. For example, pet-type robots, simulating the bodily mechanism or movements of animals, such as quadruples, e.g., dogs or cats, or so-called humanoid robots, simulating the bodily mechanism or movements of animals erected and walking on feet, such as human being, are already being put to practical use.
As compared to the industrial robots, the above-described robot apparatus are able to perform variable entertainment-oriented operations, and hence are sometimes called entertainment robots. Among these robot apparatus, there are those operating autonomously responsive to the external information or to the inner states of the robot apparatus.
The artificial intelligence (AI), used in these autonomously operating robot apparatus, represents artificial realization of intellectual functions, such as inference or decision. It is also attempted to realize the functions of feeling or instinct by artificial means. The means for representing the artificial intelligence to outside may be realized by means for visual or auditory representation. As typical of such means for the auditory representation is speech.
Meanwhile, the synthesis system for the speech synthesis apparatus, applied to such robot apparatus, may be exemplified by a text speech synthesis system. In the conventional speech synthesis from the text, the parameters necessary for speech synthesis are automatically set responsive to the results of the textual analysis, so that, while it is possible to read the lyric aloud somewhat insipidly, it is difficult to take the sound note information into account, such as to change the voice pitch or the duration of the uttered speech.