In recent years, with the prevalence of computerization of books (electronic books), electronic books have been browsed on PCs, mobile terminals, or terminals for electronic books, and a speech synthesis system (Text-to-Speech [TTS]) has been used to recite content text to provide a recitation voice listened to by users. When the text is recited to provide a recitation voice listened to by users, any text can be read aloud, and so the recitation voice can be easily obtained without the need to prepare a recitation voice for each content item. However, synthesized voice outputs may involve misreading, errors in accents, words that are difficult to understand only by sound, or homophones. Thus, users need to instruct the system to go backward through the voice recitation being continuously reproduced, by an amount corresponding to a given time or to specify a reproduction start point on a screen user interface (UI) to allow re-reading to be carried out.
However, when re-reading aloud is carried out from any point during the reading aloud, the user needs to carefully listen to candidate words for re-reading being read aloud in an order reverse to the time series, while specifying a desired start position. Furthermore, even if candidate words for re-reading are limited using prosodic boundaries or segment delimiters of a particular type as clues, output voices resulting from the re-reading aloud have the same contents as those of the last reading aloud except for preregistered synonyms. This means that the listener listens to read aloud contents with erroneous or obscure again. Hence, the listener still fails to understand the document.