Conventionally, text speech synthesis technologies for artificially created human speech from arbitrary text have been known. In the text speech synthesis technologies, voices corresponding to words or phonemes that constitute character text are synthesized to create speech (referred to as synthesized speech) corresponding to the text. To create synthesized speech of a person, it is necessary to prepare a script (referred to as recording script) that includes predetermined text, to record the voice of the person who reads the text of the recording script aloud, and to collect sounds corresponding to the respective words or phonemes to create a synthesis dictionary. Scripts for recording that are commonly used in creating a synthesis dictionary include text that is composed in consideration of the selection of phonemes and intonations. Such recording scripts often contain words that are unfamiliar to the speaker and passages that the speaker finds it difficult to pronounce. JP-A 2003-186489 (KOKAI) disclose a recording script creating apparatus for creating such a recording script, and a recording management apparatus for managing recording based on the script.
According to JP-A 2003-186489 (KOKAI), when the speaker finds it difficult to pronounce a certain piece of text in the recording script and the voice recorded for the text is rejected by the recording management apparatus, the voice for the text needs to be recorded again. This can lead to repeated retakes with an increase in recording cost and a deterioration in the quality of the voice recorded. What text is considered to be difficult to pronounce much varies from person to person, and it is difficult to prepare a script tailored to the speaker in advance. Under the circumstances, it has been difficult to collect high-quality voices, difficult to collect voices in consideration of the selection of phonemes and intonations as desired by a person who makes the recording script, and difficult to make a high-quality synthesis dictionary.