A conventional voice synthesis program (or a voice synthesizer) reads an input text file having a voice attribute so described that its voice synthesis program can be processed.
For a voice synthesis program called "ProTALKER/2" ("ProTALKER" is a trademark of the IBM Corp.), a word called a "text embedded command/voice attribute" is embedded in text to control a voice attribute at the time of reading.
Assume that the text in which an embedded command is embedded is: "Normal reading first. [*S9] Reading speed is increased here. [*P9] Voice pitch is changed to high. [*S0P0] Reading speed becomes slower with lower voice. [*Y0] Robot reading. [*S=P=Y=] Reading is returned to normal. [*F1] This is the phone number information. [*M1] Tell me the phone number of Mr. Kouichi Tanaka."
Upon receipt of this text, a voice synthesis apparatus recognizes "[*" as the head of the embedded command for instructing a voice attribute, and "]" as the termination of the embedded command. Since the above text does not designate a voice command, it is read as a default. Then, the embedded command [*S9] is detected and the reading speed is set to 9. Following this, upon the detection of [*P9], the voice pitch is set to 9, and upon the detection of [*S0P0], the reading speed and the voice pitch are set to 0. Further, upon the detection of [*Y0], the intonation is set to 0, and upon the detection of [*S=P=Y=], the reading speed, the voice pitch and the intonation are reset to normal. Sequentially, upon the detection of [*F1], text is read using a female voice, and upon the detection of [*M1], text is read using a male voice.
Changes for a plurality of attributes can be included in a single embedded command using the style format [*&lt;attribute symbol 1&gt;&lt;set value 1&gt;&lt;attribute symbol 2&gt;&lt;set value 2&gt; . . .]
The contents of the embedded commands for instructing voice attributes are as follows.
* Change in speaking speed
The speed is changed at the point where a command is encountered. Set symbol S; ten levels of set value, 0 (slow) to 9 (fast) (normal speed is 5).
* Change in voice pitch
The pitch is changed at the point where a command is encountered. Set symbol P; ten levels of set value, 0 (low) to 9 (high) (normal pitch is 2).
* Change in voice gain
The gain is changed at the point where a command is encountered. Set symbol G; ten levels of set value, 0 (small) to 9 (great) (normal gain is 9).
* Change in intonation
The intonation is changed at the point where a command is encountered. Set symbol Y; ten levels of set value, 0 (no intonation) to 9 (maximum intonation).
* Male voice
The voice is changed to a male voice at the point where a command is encountered. Set symbol M; set value 1.
* Female voice
The voice is changed to a female voice at the point where a command is encountered. Set symbol F; set value 1.
Conventionally, a technique exists for synthesizing a data file containing such voice attribute information from a text file including text attributes (style, font, underlining, etc.).
In Japanese Unexamined Patent Publication No. Hei 6-223070, for example, a method is disclosed for converting text attributes (style, font, underlining, etc.) of an input text file into voice attributes (speed, volume, etc.) by using a text-voice attribute conversion table, and for producing a speed command containing an embedded command for the voice attributes.
In addition, in Japanese Unexamined Patent Publication No. Hei 6-44247 is disclosed a method for referring to a control signal-voice synthesis signal conversion table to convert a text control signal in an input text file into a voice synthesis control signal having voice attributes.
These techniques enable the reading of a text while changes in the text attributes are reflected as voice attributes. During reading, the text attribute changes, which are generally displayed as font changes or as colors on a screen, can be expressed as voice attribute changes (the changes in the volume, pitch, intonation and speed) by a voice synthesis program (text reading program).
There is a demand by users, such as visually handicapped persons who can not use the visual information displayed on a display screen (and who, hereinafter are referred to as visually impaired users), that hypertext programs, such as Web browsers, be prepared for their use.
Conventional hypertext programs (viewers for on-line help and Web browsers) only display text data on screen and do not read the text data aloud.
Although the HTML used on the WWW (World Wide Web) of the Internet can handle voice data, advance preparation of such voice data is necessary, and since voice data takes several forms such as AU, WAV, RA, etc., software and hardware must be prepared for each form. Further, since more data is required for voice than for text, a longer transfer time is required for voice data. At the present, however, as voice data is not yet popular, most of the HTML data is provided as sentence data. But when the WWW data becomes available orally, that will be convenient.
Another demand is that not only the information currently displayed on a screen be orally reproduced, but that a visually impaired user who so desires can also easily and freely perform Web surfing while using the voice information that is provided by orally.
In Japanese Unexamined Patent Publication No. Sho 63-231493 is disclosed a related method for additionally inputting headline code at the beginning of each headline for input sentences, and for synthesizing only the contents of the headlines for voice reproduction during a fast forward and a fast reverse.
In Japanese Unexamined Patent Publication No. Hei 3-236099 is disclosed a method whereby an analysis result of a plurality of phrases is stored, and the analysis result is output in accordance with a control command that specifies a reading position in a sentence and voice output, so that the reading position can be indicated exactly.
It is therefore one object of the present invention to provide a system for identifying in text a word type that has a specific feature, and for synthesizing while following the control procedures relevant to the word type.
It is another object of the present invention to provide a system by which a visually impaired user is enabled to freely and easily control hypertext.