1. Field of the Invention
The present invention relates to a fundamental frequency pattern generation apparatus and fundamental frequency pattern generation method which generate a fundamental frequency pattern for text-to-speech synthesis.
2. Description of the Related Art
A text-to-speech synthesis system has recently been developed, which artificially generates a speech signal from an arbitrary text. A text-to-speech synthesis system generally includes three modules (i.e., a language processing unit, a prosody generation unit, and a speech signal generation unit).
Of these modules, the performance of the prosody generation unit relates to the naturalness of synthesized speech. Especially, a fundamental frequency pattern that is the change pattern of voice tone (fundamental frequency) largely affects the naturalness of synthesized speech. In the fundamental frequency pattern generation method of conventional text-to-speech synthesis, the fundamental frequency pattern is generated using a relatively simple model. This method yields only mechanical synthesized speech with unnatural intonation.
A conventional fundamental frequency pattern generation apparatus solves this problem in the following way (e.g., JP-A 2004-206144(KOKAI)). First, a fundamental frequency pattern is selected from a fundamental frequency pattern database. Then, a section of the selected fundamental frequency pattern from “the second phoneme following the accent nucleus” to “the phoneme immediately before the accent phrase end” is interpolated within the range of four phonemes or less. This enables to generate a fundamental frequency pattern containing a desired number of phonemes.
However, if the interpolation range widens, the fundamental frequency pattern generation apparatus cannot generate natural synthesized speech.
To generate natural synthesized speech, it is necessary to set the interpolation range to four phonemes or less, as described above. To do this, the fundamental frequency database needs to store an enormous number of fundamental frequency patterns containing various numbers of phonemes. Hence, the size (capacity) of the fundamental frequency database increases.
As described above, it is difficult for the conventional technique to generate a fundamental frequency pattern which allows stable generation of natural synthesized speech closer to speech uttered by a human.