1. Field of the Invention
The present invention relates to a speech-conversion processing apparatus for performing processing for converting text data into speech in order to allow, for example, a navigation apparatus to give various types of voice guidance to a user.
2. Description of the Related Art
For example, in order to perform various types of guidance, such as confirmation of voice recognition, confirmation of destination setting, and read-aloud intersection names, vehicle navigation apparatuses give voice guidance in addition to visual guidance using display screens. In vehicles in particular, in many cases, the users of such navigation apparatuses are the drivers and thus cannot stare at the display screens while driving, thus making voice guidance essential. Such voice guidance/read-aloud is not limited to navigations apparatuses and used in a wide variety of fields.
For performing voice guidance as described above, text data that contains character strings indicating contents for voice guidance is created and is divided into words, which are sound elements, and speech data for each word is created with reference to a pre-stored dictionary. Further, the individual words are associated with each other, intonation is added thereto, and resulting data is subjected to various types of necessary processing, and speech (i.e., voice) is generated. In order to perform such various types of processing, speech-conversion processing apparatuses employing TTS (text to speech) technologies have been widely used.
In such a know speech-conversion processing apparatus, a pre-stored general dictionary database, which serves as a TTS dictionary, is used with respect to plain-text data containing input character strings. The dictionary database is created so as to cover as wide a range of fields as possible, based on the premise that the speech-conversion processing apparatus is to be used in a wide range of fields. Yet, when the dictionary database is used for navigation-apparatus speech guidance in which unique words associated with map data, vehicle driving, traffic guidance, and so on are used, the general-purpose dictionary database cannot serve the purpose and may not be able to perform appropriate read-aloud/voice guidance, thus often falling short of the user's expectation.
That is, for example, in a navigation apparatus, with respect to unique words that are not stored in a general dictionary and that are used in the navigation apparatus, in some cases, pronunciation symbols used in a general database are used in response to character strings desired to be read aloud and are sent to a speech-conversion processing apparatus. In this case, as shown in FIG. 3A, when plain text “San Jose” which is supposed to be pronounced “san nozei” is received as character strings (it is to be noted that pronunciation symbols, such as “san nozei”, used herein are based on a modified version of a writing system called “Romaji”, which was originally developed to write Japanese characters by using Latin alphabets), the known navigation apparatus may pronounce it, for example, “san jyoze” by using a general dictionary and thus may not correctly pronounce it. In such a case, storing pronunciation symbols “san nozei” allows it to be correctly pronounced upon the receipt of the plain text. Similarly, for plain text “Torrance, Calif.”, storing pronunciation symbols “tôransu, kyaluforunia” allows it to be correctly pronounced.
For a vehicle navigation apparatus, since map data are used and the vehicle travels in wide areas, guidance of addresses constituted by collections of place names is essential. However, since place names are often represented by unique abbreviations or pronounced in unique ways, such variations cannot often be dealt with by a general dictionary that is provided in a speech-conversion processing apparatus by a company manufacturing the navigation apparatus, and thus, an additional TTS dictionary may be prepared. Accordingly, place names are assigned additional information and stored such that, for example, “St” represents the abbreviation of “Street” and/or “St” is pronounced “sutorîto”, as shown in FIG. 3B. Similarly, “Ave” is stored so as to be pronounced “avenyu”.
Japanese Unexamined Patent Application Publication No. 9-152893 discloses a technology for speech-conversion processing of place names. In this patent publication, place-name dictionaries are prepared for respective predetermined areas, an area of a place-name dictionary is selected based on the data of the current position of a navigation apparatus so as to prevent place-name pronunciations used in other areas from being read aloud.
In particular, in many cases, voice guidance performed by navigation apparatuses involve addresses constituted by collections of place names, and place names in addresses in many countries are often pronounced differently even for the same representation, i.e., for the same text. Thus, in addition to the above-noted general dictionary provided in a speech-conversion processing apparatus, a separate pronunciation-symbol dictionary in which pronunciation symbols are stored in association with specific place names may be created or a TTS dictionary in which proper names of specific abbreviations or pronunciation symbols therefor are stored may be used. Yet, even the use of such dictionaries cannot provide satisfactory results in many cases.
That is, pronunciation symbols used for the reading aloud of addresses are supplied from a database vender, which manufactures a database for the pronunciation symbols, and are stored in the database for use. However, since database venders handle diverse place names, they may create databases without necessarily confirming place names in the addresses of specific cities and towns and the abbreviations of places names. Therefore, there are cases in which the pronunciation symbols supplied from the database venders are wrong.
With only a TTS dictionary as described above, conversion rules defined by the TTS dictionary are applied to all words in character strings to be read aloud. Thus, for example, when the character strings of names of a place “100 St Lantana St, Los Angeles, Calif.” are received or when a navigation apparatus runs a query “Would you like to calculate a route to St Lantana St?” to start guidance-route computation, as shown in FIG. 3C, a conversion rule is defined in many cases so that “St” in the character strings “St Lantana St” is pronounced “sutorîto”.
In this case, therefore, “St Lantana St”, which is supposed to be pronounced “sento lantana strît”, is converted into speech “strît lantana strît”. On the other hand, when the conversion rule is defined so that “St” is pronounced “sento”, it is converted into speech “sento lantana sento”. In this manner, “St”, which is widely used for place names, may be pronounced “sento” other than “strît”. A dictionary as described above cannot distinguish between the pronunciations “sento” and “strît”.