A multimodal human computer interaction (HCI) system is an exemplary information processing system that handles complex information units. An example can be found in a real-time framework for natural multimodal interaction by N. Krahnstoever, et al., A real-time framework for natural multimodal interaction with large screen displays, Multimodal Interfaces, 2002. Proceedings, Fourth IEEE International Conference, 14-16 Oct. 2002 Page(s): 349-0.354 (hereinafter N. Krahnstoever, et al.).
Fusion of the multiple inputs from different modalities is one of the most important steps in such a multimodal human computer interaction (HCI) system. R. Sharma, et al., Speech-gesture driven multimodal interfaces for crisis management, Proceedings of the IEEE, Volume 91, Issue 9, September 2003 Page(s): 1327-1354 (hereinafter R. Sharma, et al.), shows architecture of a possible fusion strategy, in which a probabilistic evaluation of all possible speech gesture combinations is suggested as a better estimation of the users' intention than a single input modality. R. Sharma, et al., also discusses semantic analysis of multimodal input by the application of static and dynamic contexts.
One importance in a multimodal HCI system lies in the fact that the new interaction paradigm by the fusion can ease the information bottleneck. Partially, the strength comes from its capability to process huge amounts of data in a more efficient way than the conventional interface as discussed in R. Sharma, et al. For example, a multimodal HCI system could process a descriptive command in one step using a speech recognition input modality, which could otherwise take multiple steps using the conventional interface, such as a keyboard or a mouse. The popularity of automatic speech recognition input modality by a computer-based machine is also partially due to this fact. As discussed in R. Sharma, et al., the fusion of the gesture recognition and speech recognition provides further enhanced performance in the information processing system.
Especially with the increasing popularity of the web-applications on the Internet and the growing size of its data, the usage of a multimodal HCI system can be an efficient solution for dealing with the huge amount of data on the Internet.
In addition to the huge amount of data, users often suffer from the disorganization of data while using an information processing system, such as the multimodal HCI system, with a large database. This leads us to the importance of data organization and mining.
Consequently, one of the questions that can be produced with the usage of the multimodal HCI system or an information processing system in general is how to handle the data efficiently and in an appropriate manner, especially for a complex application or situation. For example, in the following cases, the need for an intelligent approach for handling such complex situations highly increases.                when the information processing system is intended for multiple users requiring an efficient method for collaboration among the users,        when the nature of the goal for the given tasks is complex, or        when the multiple users interact with the system through multiple human languages in a highly distributed computing environment.        
Therefore, it is an objective of the present invention to provide an intelligent information handling approach for an information processing system, such as a multimodal HCI system, which could face a situation of handling the information in a complex application scenario.
It is a further objective of the present invention to provide a novel approach to handle the fusion of multimodal inputs in the case of a multimodal HCI system.
It is a further objective of the present invention to provide a novel and intelligent tool for the data mining of a database, especially an uncharacteristic or heterogeneous one, since object verification in the present invention is also closely related to database construction and how to model data.
In addition to the above mentioned problems for handling the key processes, such as gesture and speech recognition, in a multimodal HCI system, it is found that deploying a commercially working multimodal HCI system in a real-world environment could pose many challenges. Converting a multimodal HCI system from a lab-based prototype stage with pure numerical formulas to a working commercial product requires well-designed engineering solutions for the unforeseen practical problems in the field. It is a challenging task to the researchers and engineers. Ad-hoc solutions, such as introducing additional parameters, thresholds, conditions, or error handling pieces of code, are sometimes used as quick solutions to the challenges in a real-world environment. However, the ad-hoc solutions are often non-scalable and inconsistent from one system to another.
The OVEN can provide a robust, scalable, and organized framework, which efficiently integrates these ad-hoc solutions into a manageable tool. Handling the practical challenges found in the key modalities in a commercially working multimodal HCI system is one of the tasks where the OVEN can contribute well.
Therefore, it is another objective of the present invention to provide a scientific way of analyzing and organizing the accumulated experiences and knowledge while deploying an information processing system, such as the multimodal human computer interaction (HCI) system, to the real-world environment.
From the architectural point of view for the multimodal HCI system, F. Flippo, et al., A Framework for Rapid Development of Multimodal Interfaces, In Proceedings of the 5th International Conference on Multimodal Interfaces (ICMI '03), Pages: 109-116, 2003 (hereinafter F. Flippo, et al.), suggested a multimodal framework for rapid development of multimodal interfaces, by using an application-independent fusion technique, based on the idea that a large part of the code in a multimodal system can be reused. In F. Flippo, et al., the key idea for the application-independent fusion technique is to have separation of three tasks in the multimodal system design. The three separated tasks, as defined in their paper, are obtaining data from modalities, fusing that data to come to an unambiguous meaning, and calling application code to take an action based on that meaning. In F. Flippo, et al., a semantic parse tree with time stamps is used for the fusion, which maps natural language concepts to application concepts, and the ambiguity resolving agents in this process use contextual information from the context provider, either from an external sensor, such as a gaze tracker, or from more abstract data sources, such as dialog history.
However, F. Flippo, et al., are clearly foreign to the concepts of augmenting the information units in an information processing system through a verification process and applying a polymorphic processing structure to the verified objects as disclosed in the present invention. As a matter of fact, the object verification approach in the present invention can enhance the performance of an application-independent framework and architecture in a multimodal HCI system, such as that of F. Flippo, et al.
Therefore, it is another objective of the present invention to provide a novel and efficient information handling method to the framework and architecture of such an information processing system, which F. Flippo, et al. lack.