In recent years, there has been an increased request for providing to a passage information on an attribute such as an emotion, a speech style, or the like that cannot be represented just by simple arrangement of characters, thereby achieving more natural expression of the passage rather than by a plain text or a plain synthesized speech. The attribute information is herein defined to be the information capable of being utilized to enhance expressiveness of a passage when the passage is output. The attribute information does not indicate original meaning of characters included in the passage. As the attribute information, information indicating an emotion of a talker such as delight, anger, sorrow, or pleasure or a speech style of a talker such as a recitation style, a DJ (disk jockey) style may be pointed out. As a method of representing a passage using this attribute information, an aurally appealing method using voice, music, or the like and a visually appealing method using a text color, a picture, or light have been conceived. Further, a speech synthesis system that recites a passage with emotion, and a cellular phone that displays content of a received mail by one icon have been realized.
Patent Document 1 describes an example of the speech synthesis system capable of reciting a text with emotion. This conventional speech synthesis system is formed of a speech mode specifying unit, a speech control parameter storage unit, a speech control level setting unit, and a speech synthesizing unit.
The speech synthesis system in Patent Document 1 is the system that recites an entire passage according to attribute information provided to a user. More specifically, when the speech synthesis system in Patent Document 1 receives specification of an arbitrary speech mode from the user, the speech synthesis system reads out a combination of levels such as a generation speed, an intonation level and the like corresponding to the specified speech mode, from the speech control parameter storage unit. Then, in this speech synthesis system, the combination of levels that has been read out is collectively set by the speech control level setting unit, and then a synthesized voice that represents an emotion is generated by the speech synthesizing unit.
Patent Document 2 discloses a speech synthesis system in which fine attribute information is provided, thereby allowing smooth expression of an emotional transition. The speech synthesis system in Patent Document 2 is formed of a text analyzing unit, an emotional information providing unit, and an emotional information interpolating unit.
In the speech synthesis system in Patent Document 2, an input passage is divided into segments by the text analyzing unit, and emotional information is provided to a segment including a character string in which the emotional information is defined, by referring to an emotion provision rule. The emotion provision rule defines emotional information on character string expressions that express emotions. In this speech synthesis system, when different emotional information is provided to adjacent segments, interpolating emotional information that smoothly changes an emotional transition between the adjacent segments is provided to allow expression of a natural emotional change.    Patent Document 1: JP Patent Kokai Publication No. JP-A-05-100692    Patent Document 2: JP Patent Kokai Publication No. JP-P2005-181840A