1. Field of the Invention
The present invention relates to speech-synthesis processing performed in an information-communication device that is connected to a communication line and that is ready for multimedia communications capable of transmitting and/or receiving speech data, video data, an electronic mail, and so forth.
2. Description of the Related Art
In the past, speech-synthesis devices are usually installed in an apparatus and/or a system for public use, such as a vending machine, an automatic-ticket-examination gate, and so forth. Recently, however, the number of devices having a speech-synthesis function increases, and it is not uncommon to install the speech-synthesis function in relatively low-priced consumer products including a telephone, a car-navigation system, and so forth. Subsequently, efforts are being made to increase the user-interface capability of personal devices.
Incidentally, the above-described personal devices have become increasingly multifunctional. For example, some of car-navigation systems have not only a route-guide function, but also an audio function and an internet-browsing function including a network-connection function, which makes the car-navigation systems multifunctional.
Likewise, the telephones or the like have become increasingly multifunctional. Namely, not only the telephone function, but also the network-connection function and/or a scheduler function are installed in the telephones, which make the telephones multifunctional.
Further, a function achieved by using the speech-synthesis technology is mounted in each of the functions mounted in the device such as the telephone, the functions making the telephones multifunctional. The speech-synthesis function provided in the device is used for many purposes.
For example, according to an example relationship between the composite function and the speech-synthesis function of the telephone, an incoming-call-read-aloud function, a phone-directory-read-aloud function, and so forth can be achieved, as the telephone function.
Further, a schedule-notification function can be achieved, as the scheduler function. Further, for the network-connection function, a home-page-read-aloud function, a mail-read-aloud function, and so forth are provided, as the speech-synthesis function.
Hereinafter, known technologies will be discussed. First, a method of estimating information about the field of document data stored in a document database, and switching between recognition dictionaries used during character-recognition processing according to the estimated field information is known. The above-described method is disclosed in Japanese Patent Laid-Open No. 8-63478, for example. According to the above-described method, the contents of a document to be read aloud may be necessarily examined in advance.
Further, a known system configured to switch between speaker-by-speaker-word dictionaries on the basis of input speaker information when details on text data to be read aloud are analyzed, so as to perform the speech-synthesis processing, is disclosed in Japanese Patent Laid-Open No. 2000-187495, for example.
Further, there has been proposed a method of switching between dictionaries for each of tasks of a specific function of a device, where the specific function is a game program, and reading aloud a phrase of which information is stored in the game program in advance, so as to perform the speech-synthesis processing. The above-described method is disclosed in Japanese Patent Laid-Open No. 2001-34282, for example.
The speech-synthesis function of a known device often includes a user-dictionary function. In the case where a language using readings in kana, such as Japanese, is used, the reading of the word becomes “mitsube”, when the word refers to a personal name. However, when the word does not refer to the personal name, the reading of the word becomes “sanbu (three copies)”.
When the speech-synthesis function is provided, as the telephone function, it is preferable that the device reads aloud a message, as “You have a phone call from Mr. Mitsube”, upon receiving an incoming-phone call, and reads aloud a message, as “I am going to dial Mr. Mitsube”, when a user dials to Mr. Mitsube.
When the word is registered with a user dictionary of the speech-synthesis function so that the word is read, as “mitsube”, the word is appropriately read aloud when the speech-synthesis function is used, as the telephone function. However, when the device has a home-page-read-aloud function operating in synchronization with the speech-synthesis function and when a home-page shows the sentence “You need three copies of the book”, for example, the device reads aloud the sentence, as “You need mitsube of the book”, which makes it difficult for the device to inform the user of the contents of the home page correctly.
In the case where a language using no readings in kana, such as English, is used, the reading of the word “Elizabeth” often becomes “Beth” and/or “Liz” denoting the nickname of a person named as Elizabeth, when the word “Elizabeth” refers to a personal name. However, when the word “Elizabeth” is used, as the name of a place, a park, or a building, the reading of the word “Elizabeth” is not changed into that of the nickname.
As in the above-described example, when the word “Elizabeth” is registered with the user dictionary so that the word is read, as “Liz”, and when the telephone function is used, the device reads aloud a message, as “You have a phone call from Liz”, upon receiving an incoming call. However, when a home page shows the phrase “the city of Elizabeth”, as a place name, the device reads aloud the phrase, as “the city of Liz”, which makes it difficult for the device to inform the user of the contents of the home page correctly.
The above-described example shows the case where a single device includes at least two functions. One of the functions is achieved by abbreviating and/or reducing the pronunciation and/or word of a predetermined phrase so that the user of the device can easily understand the meaning of the phrase. However, according to the other function, the abbreviation and/or reduction of the pronunciation and/or word of the predetermined phrase does not make the phrase understandable for the user.
According to another example, one of the meanings of an English abbreviation “THX” is the name of a theater system used for a movie theater. In that case, the word “THX” is pronounced, as three letters “T”, “H”, and “X” of the alphabet.
On the other hand, an enterprise named as “The Houston Exploration” is referred to as the abbreviation “THX” in the stock market or the like. However, the name of the enterprise is pronounced, as “The Houston Exploration” in news reports or the like.
However, the word “THX” used in an ordinary letter and/or mail is an abbreviation of the word “Thanks”, where the abbreviation is used, so as to reduce the trouble to write the word “thanks”. In that case, the word “THX” is pronounced, as “Thanks”.
Thus, since the word “THX” has three meanings and three readings, the word “THX” can be used in three different ways according to the situation where the word “THX” is used. The above-described example shows the case where a predetermined single word has a plurality of readings and meanings. If the word “THX” is uniformly read aloud according to the definition thereof registered with the user dictionary irrespective of the current situation and/or the currently used function, the meaning and/or reading of the word “THX” becomes significantly different from what it should be.
Thus, the pronunciation and/or reading of a single written word often changes according to the situation where the word is used all across the world. The above-described trouble will be specifically described, as below.
That is to say, it is difficult to read aloud data correctly by using a device including a composite function. Particularly, it is difficult to read aloud data correctly by using a device including a function of reading data obtained through network browsing without storing data on a phrase to be read aloud in the device, a function of inputting data on phrases that fall within an object range which is so large that it is difficult to store the phrase data in the device in advance, as phone-directory data, through the user, and reading aloud the phrase data, and so forth. Here, the latter function corresponds to the phone-directory function, for example.
Thus, with regard to the reading of a phrase, in a device having a plurality of different functions including a function of reading phrases to be read aloud, where the phrases fall within a large object range, a function of reading aloud private information, a function of reading aloud general information including no private information, the contents of a user dictionary shared in the device uniformly affect the above-described functions. Therefore, an error may occur in each of the functions, depending on which of the phrases registered with the user dictionary is read aloud.