In recent years, with the improvement of sound quality and speaker's individuality, a speech synthesis technology has been used in many fields including narration in public facilities and public transportation, interface in entertainment or interaction with the system, or the like. In addition, automation of read-aloud speech of texts of e-books and the like has been attempted.
In general, in various documents, there are unique phrases, expressions, notations, and the like of new words, unknown words, proper nouns, and the like. It is difficult to automatically and correctly estimate reading and accents of an arbitrary natural sentence (sentence including Chinese characters and Japanese characters) by using the speech synthesis technology and to output the reading and accents as speech. For this reason, methods of manually modifying portions where the system cannot automatically designate the reading and accents have been frequently used. Specifically, positions where misreading and accent errors occur due to a speech synthesis function are manually designated with correct reading and accent.
As a technology supporting the speech synthesis function, has been proposed a technology for editing read-aloud speech in a short time and efficiently by providing, to the user, portions which are to be corrected according to statistics of words appearing in documents and a speech recognition result text or the like of synthesized speech with order being added. However, during the editing of the read-aloud speech, even if the read-aloud speech needs to be frequently modified or finely adjusted according to review results, since an influence range on the entire document due to the correction of the read-aloud speech cannot be recognized, there is a problem in that backtracking or oversight in correction work may occur.