This application is based up on and claims the benefit of priority from the prior Japanese Patent Application No. 11-185859, filed on Jun. 30, 1999.
The present invention relates to a speech recognition support method and an apparatus for supporting a system to retrieve a map in response to a user""s input speech.
In a system to retrieve a map by using a speech command such as a car-navigation system, a user often speaks a name of place or institution in an area of retrieval object. In the car-navigation system, a plurality of road-maps are previously stored in a map-database. The user""s utterance word speech is recognized as a place name or an institution name. A map in which the place name or the institution name is included is retrieved from the map-database and presented to the user. In this case, if a point of the place name or the institution name as the recognition result is far from an area on the map, a possibility that the user""s input speech is erroneously recognized is high. Accordingly, if the system unconditionally executes an operation for this erroneous recognition result, the error operation is often occurred as a result. For example, if a map including a point of the erroneous recognition result is retrieved from the map-database and presented to the user during driving a car, this presentation is useless for the user""s driving.
Therefore, as a present method in the car-navigation system to retrieve a map in response to a speech command, a large number of place names described in the map-database are hierarchically arranged from high level to low level such as state, city, town, street.
The user speaks place names from a high level to a low level in order. In short, while the user speaks the place names from the high level to the low level, the map of retrieval object is further limited as the small area in the map-database. However, assume that the user speaks a name place of a high level and the area of retrieval object is limited to some extent. In the next step, the place name of the user""s input utterance is limited to place names included in the area of retrieval object. In this case, if the user wishes to input a place name outside the area of retrieval object, this input processing of the system is returned to the highest level (for example, state) and the user must input the place name of the highest level (state name) again. This operation is troublesome for the user.
In this way, in the car-navigation system to retrieve a map by the user""s input speech command, the large number of place names in the maps are hierarchically arranged and the map of the retrieval area is limited by the place name of the user""s utterance from high level to low level. However, in this condition that the map of the retrieval area is limited to some extent, if the user utters a place name outside the retrieval area in order to retrieve another map, the user must release this limitation of the retrieval area in the system and the user must utter a place name of the highest level again. This operation is very troublesome for the user.
It is an object of the present invention to provide a speech recognition support method and an apparatus not to limit the retrieval area to input the utterance of the place name and to avoid an unnecessary operation by erroneous recognition of the utterance.
According to the present invention, there is provided a speech recognition support method applied to a system to retrieve a map in response to a user""s input speech, comprising the steps of: assigning a recognition result to the user""s input speech; calculating, if the recognition result of the user""s input speech represents a point on the map, a distance between the point and a base point on the map; deciding whether the distance is above a threshold; and outputting, if the distance is above the threshold, an inquiry to the user to confirm whether the recognition result is correct.
Further in accordance with the present invention, there is also provided a speech recognition support method applied to a system to retrieve a map in response to a user""s input speech, comprising the steps of: recognizing the user""s input speech; obtaining a plurality of recognition candidates as the recognition result; extracting, if the first candidate in the plurality of recognition candidates represents a point on the map, the recognition candidates each representing a point on the map from the plurality of recognition candidates; calculating a score of each of the extracted recognition candidates by adding a function value of distance between a point of each recognition candidate and a base point on the map to a value of similarity degree between the each recognition candidate and the input speech; deciding whether the distance of the recognition candidate of the highest score is above a threshold; and outputting, if the distance is above the threshold, an inquiry to the user to confirm whether the recognition candidates of predetermined number of higher score are correct.
Further in accordance with the present invention, there is also provided a speech recognition support apparatus for retrieving a map in response to a user""s input speech, comprising: a speech recognition unit configured to assign a recognition result to the user""s input speech; a distance decision unit configured to calculate a distance between a point of the recognition result and a base point on the map if the recognition result represents a point on the map, and to decide whether the distance is above a threshold; and a response generation unit configured to generate an inquiry to the user to confirm whether the recognition result is correct if the distance is above the threshold.
Further in accordance with the present invention, there is also provided a speech recognition support apparatus for retrieving a map in response to a user""s input speech, comprising: a speech recognition unit configured to recognize the user""s input speech and to obtain a plurality of recognition candidates as the recognition result; a distance decision unit configured to extract the recognition candidates each representing a point on the map from the plurality of recognition candidates if the first candidate represents a point on the map, to calculate a score of each of the extracted recognition candidates by adding a function value of distance between a point of each recognition candidate and a base point on the map to a similarity degree between the each recognition candidate and the input speech, and to decide whether the distance of the recognition candidate of the highest score is above a threshold; and a response generation unit configured to generate an inquiry to the user to confirm whether the recognition candidates of predetermined number of higher score are correct if the distance is above the threshold.
Further in accordance with the present invention, there is also provided a computer readable memory containing computer readable instructions in a system to retrieve a map in response to a user""s input speech; comprising: instruction means for causing a computer to assign a recognition result to the user""s input speech; instruction means for causing a computer to calculate, if the recognition result of the user""s input speech represents a point on the map, a distance between the point and a base point on the map; instruction means for causing a computer to decide whether the distance is above a threshold; and instruction means for causing a computer to output, if the distance is above the threshold, an inquiry to the user to confirm whether the recognition result is correct.
Further in accordance with the present invention, there is also provided a computer readable memory containing computer readable instructions in a system to retrieve a map in response to a user""s input speech, comprising: instruction means for causing a computer to recognize the user""s input speech; instruction means for causing a computer to obtain a plurality of recognition candidates as the recognition result; instruction means for causing a computer to extract, if the first candidate in the plurality of recognition candidates represents a point on the map, the recognition candidates each representing a point on the map from the plurality of recognition candidates; instruction means for causing a computer to calculate a score of each of the extracted recognition candidates by adding a function value of distance between a point of each recognition candidate and a base point on the map to a value of similarity degree between each recognition candidate and the input speech; instruction means for causing a computer to decide whether the distance of the recognition candidate of the highest score is above a threshold, and instruction means for causing a computer to output to the user, if the distance is above the threshold, an inquiry to confirm whether the recognition candidates of predetermined number of higher score are correct.