There are known techniques to recognize person's random utterances and to execute operations of machines etc. using the recognized results these days. These techniques are applied to speech-based interfaces in mobile phones and navigation systems. They infer the intentions of the recognition results of input speech sounds, and are capable of processing a wide variety of user's expressions using the intention inference models that are trained by means of statistical methods using a wide variety of corpuses with those corresponding intentions.
These techniques are effective if there is one intention in one utterance. However, they hardly infer multiple intentions accurately when a speaker inputs an utterance like a complex sentence that involves multiple intentions. For example, the utterance “tokyo tower mo yoritai ga, saki ni skytree he yotte (Indeed I want to visit Tokyo Tower, but visit Skytree first).” involves two intentions: one is an intention to set a facility Skytree as an intermediate destination, and the other is an intention to set a facility Tokyo Tower as an intermediate destination. The intention inference models mentioned above have difficulties to infer these two intentions.
For the problem mentioned above, Patent Literature 1, for example, discloses the method of inferring the proper division point for an input text of an utterance which involves multiple intentions by means of intention inference with division-point probabilities of a complex sentence.