Conventional text edit apparatuses or text edit methods have been known which estimate how readers will be impressed by expression (contents) in a text and then rewrite a portion against writer's desired impression into a different expression so as to give the writer's desired impression (refer to Patent Reference 1, for example).
Text-to-speech apparatuses or text reading methods using text edit functions have also been known which observe combinations of pronunciation sequences when a target text is reading aloud, then rewrite an expression portion having a pronunciation combination unlikely to be listened to into a different expression easy to be listened to, and eventually read the text aloud (refer to Patent Reference 2, for example).
In addition, methods for evaluating reading voices have been known which evaluate a combination of voice pronunciations from a viewpoint of “confusing-ness”, by estimating a similarity between two sequences of Katakana characters (Japanese alphabets) to be read aloud continuously, and if the estimation result satisfies certain conditions, determining that the continuous reading of these sequences confuse listeners since their pronunciations are similar (refer to Patent Reference 3, for example).
As described below, there is another challenge except the “easy to be listened to” and the “confusing-ness”, which is to be overcome by editing a text based on the evaluation result of text reading voices.
When a reader reads a text aloud, sound quality of the reading voices is sometimes partially changed due to tensing or relaxing of a phonatory organ which the reader does not intend to do. When listeners listen to the change in the sound quality due to tensing or relaxing of a phonatory organ, the change is heard as “pressed voice” or “relaxed voice” of the reader. However, the voice quality changes such as “pressed voice” and “relaxed voice” in voices are phenomena characteristically observed in voices having emotion and expression, and it has been known that such partial voice quality changes characterize emotion and expression of the voices and thereby create impression of the voices (refer to Non-Patent Reference 1, for example). Therefore, when a reader reads some text aloud, listeners sometimes comprehend impression, emotion, expression, and the like, from the voice quality changes partially occurred in the reading voices, rather than expression modes (writing style and wording) and contents of the text. A problem is encountered when the listener's impression is not what the reader has intended to convey or is different from what the reader has expected. For instance, while a reader reads lecture documents aloud, when a voice of the reader becomes falsetto accidentally without reader's intension and thereby voice quality change occurs although the reader is reading the documents calmly and without any emotion, this may give listeners impression that the reader is nervous and upset.
[Patent Reference 1] Japanese Unexamined Patent Application Publication No. 2000-250907 (page 11, FIG. 1)
[Patent Reference 2] Japanese Unexamined Patent Application Publication No. 2000-172289 (page 9, FIG. 1)
[Patent Reference 3] Japanese Patent Publication No. 3587976 (page 10, FIG. 5)
[Non-Patent Reference 1] “Ongen kara mita seishitsu (Voice Quality Associated with Voice Sources)”, Hideki Kasuya and Yang Chang-Sheng, Journal of The Acoustical Society of Japan, Vol. 51, No. 11, 1995, pp 869-875