Speech synthesis technologies for converting a certain text into a synthesized waveform are known. In order to reproduce the quality of voice of a certain user by using a speech synthesis technology, a speech synthesis dictionary needs to be created from recorded speech of the user. In recent years, research and development of speech synthesis technologies based on hidden Markov model (HMM) have been increasingly conducted, and the quality of the technologies is being improved. Furthermore, technologies for creating a speech synthesis dictionary of a certain speaker in a second language from speech of a certain speaker in a first language have been studied. A typical technique therefor is cross-lingual speaker adaptation.
In related art, however, large quantities of data need to be provided for conducting cross-lingual speaker adaptation. Furthermore, there is a disadvantage that high-quality bilingual data are required to improve the quality of synthetic speech.