1. Field
The following description relates to a system using multi-modal information, and more particularly, to an apparatus and method for processing a user's input using multimodal information.
2. Description of the Related Art
A multimodal interface allows persons to interface with machines through voice, a keyboard, a pen, etc. for communication therebetween. In analyzing a user's intention using multimodal information inputted through such a multimodal interface, there are various methods such as a method of combining multimodal inputs at signal level and then analyzing the combined result, and a method of analyzing respective modality inputs and then combining the analyzed results at meaning level.
The method of combining multimodal inputs at signal level involves the immediate combining, analyzing and classifying multimodal input signals, and may be used to process signals that are simultaneously generated, for example, a voice signal and a signal regarding movement of the lips. However, the method has a disadvantage in that since two or more signals have to be integrated and processed, a very large characteristic space is required, a very complicated model is needed to perceive the relationship between the signals and accordingly the amount of learning required for the model is high. Also, since the method does not have good expandability, it is not easy to combine the method with other modalities or to apply it to other terminals.
The method of combining modality inputs at meaning level involves analyzing the meaning of each modality input signal and then combining the results of the analysis. The method has advantages in learning and expansion since independencies between modalities are maintained. However, a user uses multimodal inputs because there are associations between modalities, and analyzing the meanings of multimodal inputs individually makes finding such associations difficult.