In recent years, various dialog systems exist. As the processing of a dialog system, generally, an utterance is recognized by performing language analysis of a user utterance, and a dialog is controlled using the result of the utterance recognition. However, since the representation of the user utterance may change in accordance with the situation, it may be impossible to know the intention of the utterance only by the language analysis. Hence, the intention of the utterance needs to be estimated in consideration of the situation of the utterance.
As a related art, there is a method of making a selection from intention candidates as the recognition results of commands that are specific utterances as user utterances based on environmental information (position information, traffic situation, road surface state, and the like) around an information terminal. If the recognition rate of the command recognition results is low, the intention is estimated only from the environmental information. If the recognition rate of the command recognition results is sufficient, the intention is estimated from the environmental information and the command recognition results.