Conventionally, as disclosed in JP 11(1999)-95783 A, for example, a technology is known for clustering prosodic information included in speech data into a prosody controlling unit such as an accent phrase so as to generate representative patterns. Some representative patterns are selected among the generated representative patterns according to a selection rule, are transformed according to a transformation rule and are connected, so that the prosody as a whole sentence can be generated. The selection rule and the transformation rule regarding the above-described representative patterns are generated through a statistical technique or a learning technique.
However, such a conventional prosody generation method has a problem in that a distortion of the generated prosodic information is considerable due to the presence of the accent phrases having attributes such as a number of moras and an accent type, which are not included in the speech data used when generating the representative patterns.