1. Field of the Invention
The present invention relates to a technique for a multimodal user interface.
2. Description of the Related Art
A multimodal user interface is convenient in that it provides the user with a plurality of input sources, such as graphical user interface (GUI) input and speech input, to enable the user to input information by a desired input source. The multimodal user interface is particularly convenient when a plurality of input sources are simultaneously used. For example, by saying “Move this to here” while clicking on items corresponding to “this” and “here” on a GUI, the user can freely control the items even if he/she is unfamiliar with technical terms, such as commands. To allow such an operation to take place, inputs from a plurality of input sources need to be integrated.
Examples of methods for integrating inputs from a plurality of input sources include a method in which information related to the type and speed of a mouse event is used (see Japanese Patent Laid-Open No. 8-286887 and Japanese Patent Laid-Open No. 9-81364 (corresponding U.S. Pat. No. 5,781,179)), a method in which a linguistic analysis is performed on the result of speech recognition (see Japanese Patent No. 2993872), a method using context information (see Japanese Patent No. 3375449), a method in which some recognition results whose input times are close to each other are collected and output as a unit of semantic analysis (see Japanese Patent No. 3363283 (corresponding U.S. Pat. No. 5,884,249)), a method in which a delay in the result of the recognition of input data is taken into consideration (see Japanese Patent Laid-Open No. 10-198544), a method in which the intention of the user is detected by statistical learning (see Japanese Patent Laid-Open No. 11-288342 and Japanese Patent Laid-Open No. 2001-100878), a method using a method of grammatical analysis (see Japanese Patent Laid-Open No. 6-282569), a method in which a linguistic analysis is performed to use a semantic structure (see Japanese Patent Laid-Open No. 2000-231427), and a method in which pointing inputs from a pointing device, such as a mouse, are registered in a list, the number of referring expressions in speech input data is compared with the number of pointing inputs in the list, and if the number of the pointing inputs is smaller than that of the referring expressions in speech input data, the number of pointing inputs is adjusted by obtaining the subsequent pointing input, thus integrating a speech input and a pointing input (see Japanese Patent Laid-Open No. 7-110734).
In the known examples described above where the input time of each input or the order of inputs are taken into consideration, complex processing must be performed to analyze a plurality of candidates for an input result. Moreover, although the above-described examples are based on the premise that speech inputs can be accurately recognized, it is difficult, under the current speech recognition technologies, to achieve recognition with perfect accuracy. Therefore, solutions to the problem of false recognition are important. However, in the known examples described above, there is no description about a solution in case of false recognition or a method for the reduction of false recognition rate.
Japanese Patent Laid-Open No. 7-110734 discloses a technique in which the integration is made pending the subsequent pointing input if the number of pointing inputs is smaller than that of referring expressions in speech input data. Similarly to the examples described above, this technique is based on the premise that the number of referring expressions in speech input data are accurately recognized. Moreover, there is no description of false recognition or a method for the reduction of false recognition rate. In the technique disclosed in Japanese Patent Laid-Open No. 7-110734, if the number of pointing inputs is larger than that of referring expressions in speech input data, error processing is executed and reentry of information is required. Since the reentry of information puts a burden on the user, a technique for preventing such problems needs to be developed.
The present invention has been made in view of the circumstances described above, and is directed to the improvement of accuracy in recognizing instructions indicated by input from at least two input sources.