1. Field of the Invention
The present invention relates to a speech synthesis technique.
2. Description of the Related Art
For train guidance on station platforms, traffic jam information on expressways, and the like, domain-specific synthesis is used, which combines and concatenates pre-recorded speech data (pre-stored word speech data and phrase speech data). This scheme can obtain synthetic speech with high naturalness because the technique is applied to a specific domain, but cannot synthesize speech corresponding to arbitrary texts.
A concatenative synthesis system, which is a typical rule-based speech synthesis system, generates rule-based synthetic speech by dividing an input text into words, adding pronunciation information to them, and concatenating the speech segments in accordance with the pronunciation information. Although this scheme can synthesize speech corresponding to arbitrary texts, the naturalness of synthetic speech is not high.
Japanese Patent Laid-Open No. 2002-221980 discloses a speech synthesis system which generates synthetic speech by combining pre-recorded speech and rule-based synthetic speech. This system comprises a phrase dictionary holding pre-recorded speech and a pronunciation dictionary holding pronunciations and accents. Upon receiving an input text, the system outputs pre-recorded speech of a word when it is registered in the phrase dictionary, and outputs rule-based synthetic speech of a word which is generated from the pronunciation and accent of the word when it is registered in the pronunciation dictionary.
In speech synthesis disclosed in Japanese Patent Laid-Open No. 2002-221980, since voice quality greatly changes near the boundary between pre-recorded speech and rule-based synthetic speech, the intelligibility may deteriorate.