In recent years, attention has been paid to a technology in which an operation of an apparatus is executed using a voice recognition result about a language spoken by a person. This technology is applied to in the voice interfaces in mobile phones, car-navigation devices and the like. As a conventional basic method, there is a method in which, for example, the apparatus stores beforehand a correspondence relationship between an estimated voice recognition result and an operation, and then, when a recognition result of a speech spoken by the user is the estimated one, the operation corresponding to that recognition result is executed.
According to this method, in comparison with the case where the user manually causes an operation, the operation can be directly executed through phonetic speech, and thus, this method serves effectively as a short-cut function. At the same time, the user is required to speak a language that the apparatus is waiting for, in order to execute the operation, so that, as the functions to be concerned by the apparatus increase, the languages that the user has to keep in mind increase. Further, generally, among the users, a few of them use the apparatus after fully understanding its operation manual. Thus, the users not understanding the manual do not know how to talk what language for an operation, so that there is a problem that, actually, the user cannot cause an operation through voice without using a command of the function kept in his/her mind.
In this respect, as a technology improved in the above problem, the following method is proposed: even if the user does not keep in mind a command for accomplishing the purpose, an apparatus interactively guides the user to thereby lead the user to accomplishment of the purpose. As one important technology for realizing that method, for example, Patent Document 1 discloses a technology for properly estimating the user intent from the speech of the user.
The voice processing device in Patent Document 1 has a linguistic dictionary database and a grammar database, for each of plural pieces of intent information indicative of respective plural intents, and further retains information of the commands executed so far, as pre-scores. For each of the plural pieces of intent information, the voice processing device calculates an acoustic score, a language score and a pre-score, each as a score indicative of a degree of conformity, to each piece of intent information, of the voice signal inputted based on the speech of the user, followed by totalizing these scores to obtain a total score, and then selects the intent information with the largest total score. Further, it is disclosed that, based on the total score, the voice processing device puts the selected intent information into execution, puts it into execution after making confirmation with the user, or delete it.
However, in Patent Document 1, the defined intents are uniquely identifiable intents in a form, such as “Tell me weather” or “Tell me clock time”, and there is no mention about processing of intents assuming that the intents include a variety of facility names each required for setting, for example, a destination point in a navigation device.