A lot of TV programs have been subtitled worldwide with consideration for the hearing impaired or for other reasons. Meanwhile, with the Internet and other media becoming widely used, a variety of text information has been available. However, with downsizing of a device displaying the text information, the screen size has been reduced, undesirably making it difficult to read the text information. To solve the problem, a device converting a text string to voice is devised (refer to patent literature 1 for instance).
FIG. 21 is a block diagram showing a configuration of a conventional readout device. As shown in FIG. 21, a conventional readout device includes tone adjusting unit 2001, voice data storage unit 2002, standard speed data storage unit 2003, replay speed input unit 2004, replay speed ratio calculating unit 2005, control unit 2006, and voice replay unit 2007.
Voice data storage unit 2002 digitally stores voice data. Standard speed data storage unit 2003 stores standard speed data representing replay speed of voice data by the number of words corresponding to the voice data and the standard replay time. Replay speed input unit 2004 provides information on change of the replay speed by the number of words per unit time. Replay speed ratio calculating unit 2005 determines a replay speed ratio from the number of words per unit time provided from replay speed input unit 2004; and the number of words at the standard replay speed. Control unit 2006 outputs voice data, standard speed data, and a replay speed ratio read from voice data storage unit 2002, standard speed data storage unit 2003, and replay speed ratio calculating unit 2005, to tone adjusting unit 2001. Voice replay unit 2007 replays output from tone adjusting unit 2001. In this way, the readout device allows setting replay speed by specifying the number of words per unit time while maintaining tone changes due to fluctuations in replay speed to a constant standard value.
In other words, with a conventional readout device, pronouncing can be ended within a predetermined time by a method such as changing pronouncing speed, if the number of characters of a text string to be read is preliminarily specified or readout time is predetermined. However, for subtitle information where it is unknown when the next text string arrives and how many characters the string contains; and for description on the Internet where addition and update are made by an unspecified large number of people, the number of characters cannot be identified or time required cannot be predetermined, making it difficult to set pronouncing speed to an optimum value.
For a text string displayed or read synchronously to video to be presented to viewers, for such as subtitle information, when the text string is read too fast, it is undesirably difficult to hear. When the text string is displayed and changed too fast, some of it cannot be read within its display period. When the readout speed is lower than the speed of an arriving text string, the video cannot be synchronized to the text string.
With needs of the hearing impaired and improvement of accuracy in voice recognition, service has been available in which a speech produced by an announcer is automatically converted to text strings and multiplexed as subtitles into a broadcast wave. However, an average viewer reads a text string displayed and acknowledges its meaning slower than the viewer listens to and acknowledges the speech. Actually, some words need to be changed to shorter ones and unnecessary words need to be omitted when converting to subtitles, which makes complete automatization difficult.
[Patent literature 1] Japanese Patent Unexamined Publication No. H11-7295