There is known a speech synthesis device that analyzes character string information indicating a character string, and generating synthesis speech by regular synthesis from speech information indicated by the character string. In the speech synthesis device that generates synthesis speech by regular synthesis, prosody information on synthesis speech (information on tone pitch (pitch frequency), tone length (prosodic duration), and sound magnitude (power)) is first generated based on an analysis result of the input character string information. Then, a plurality of optimum segments (waveform generation parameter sequences having a length of syllable or demisyllable) are selected from a segment dictionary based on the character string analysis result and the generated prosody information, thereby creating one optimum segment sequence. Then, a waveform generation parameter sequence is formed by the optimum segment sequence and a speech waveform is generated from the waveform generation parameter sequence, thereby obtaining synthesis speech. The segments accumulated in the segment dictionary are extracted and generated from a large amount of natural speech in various methods.
In such a speech synthesis device, a speech waveform having prosody close to the generated prosody information is created from segments in order to secure high sound quality when generating a synthesis speech waveform from selected segments. A method for generating both a synthesis speech waveform and segments used for generating the synthesis speech waveform employs the method described in Non-Patent Literature 1, for example. A waveform generation parameter generated by the method described in Non-Patent literature 1 is cut out from a speech waveform by use of a window function having a parameter (more specifically, a time width calculated from a pitch frequency) in a time domain. Therefore, the processings such as frequency conversion, logarithm conversion and filtering are not required for the waveform generation, and thus a synthesis speech waveform can be generated with fewer calculations.
Patent Literature 1 describes a speech recognition device and Patent Literature 2 describes a speech segment generation device.