1. Field of the Invention
The present invention relates to a speech recognition apparatus and, more particularly, to a speech recognition apparatus using a multimodal user interface that is a combination of a graphical user interface (GUI) and a speech user interface (UI).
2. Description of the Related Art
Recent development of the speech recognition technology and improvement of hardware performance of speech recognition devices are enabling speech input in various computer-controlled devices such as car navigation systems, portable phones, and FAX apparatuses except personal computers and workstations.
Speech input generally provides the following merits.
(1) It allows a user to input without seeing the screen or using hands.
(2) It allows direct setting of items that are not displayed on the screen.
(3) It allows a user to set a plurality of items by one utterance.
Assume that a user wants to do copy setting in a copy machine to print a document on A4 sheets in quintuplicate. A normal GUI or UI based on key input makes the user to execute a plurality of steps, i.e., input the number of copies by using the ten-key pad, press the paper size button on the screen, and press the “A4” key on the screen.
With speech input, the user can set the paper size by only uttering “A4”, and this eliminates the user's effort required to display the paper size setting window as described in merit (2).
The user can also set the paper size and the number of copies at once by uttering, e.g., “A4, five copies” as described in merit (3).
There are also proposed techniques of increasing the operation efficiency by using multimodal input that combines GUI and speech instead of simply using GUI input or speech input (e.g., Japanese Patent Registration No. 2993872 and Japanese Patent Laid-Open No. 6-282569).
Although speech input has various merits, it also has a demerit of “misrecognition”. For example, even when the user utters “A4”, the speech recognition apparatus may misrecognize it as “A3”.
Even if the user utters “A4, five copies”, it may be misrecognized as “A4 to B5”. In this case, although the user wants to set two items, i.e., the paper size and the number of copies at once, the apparatus misrecognizes it as setting of one item “scaling factor”. The misrecognition of the item itself greatly confuses the user, who must then make much effort to correct the error.
The conventional GUI operation often uses hierarchical steps, i.e., makes the user select a setting item by a key and then set the detailed value of the setting item. This operation method can avoid misrecognition of the setting item itself, unlike speech input. However, the need to execute the plurality of steps increases the load on the user, as described above.
Even in multimodal input combining a GUI and speech input, a method using natural language analysis for speech input by a natural language, like Japanese Patent Registration No. 2993872, is susceptible to somewhat low accuracy of natural language analysis.