1. Field of the Invention
The present invention relates to a speech recognition apparatus which recognizes input speech using speech recognition grammar.
2. Description of the Related Art
Speech is a natural interface for human beings, and in particular, it is an effective user interface (UI) for users such as children or elderly people who are not familiar with operating devices, or for the visually disabled. Recently, a data inputting method which combines this speech UI and GUI (Graphical User Interface) is drawing attention, and has been talked about in the W3C Multimodal Interaction Activity (http://www.w3.org/2002/mmi) or in the SALT Forum (http://www.saltforum.org/).
In general, data input by speech uses a conventional speech recognition technology. This speech recognition is a process in which the input speech and the recognition target vocabulary described within the speech recognition grammar are compared, and the vocabulary that fits best is outputted as the recognition result. However, by this method, the recognition performance deteriorates if the scale or the vocabulary size of the speech recognition grammar increases. In order to prevent such a problem, WO02/031643 discusses a technology in which speech recognition is conducted by detecting the input item presently displayed for the user on the GUI and using the speech recognition grammar corresponding to the item. As a result, the size of the recognition target vocabulary used in the speech recognition can be limited, and the deterioration of the speech recognition performance can be prevented.
In a system including a speech UI, the user is often asked to start speaking after pushing a specific key (which is referred to as a Push To Talk key). Such a method is advantageous in that the speech segment can be easily detected, and deterioration of the speech recognition performance can be reduced even in a noisy environment. There exists prior art in which a plurality of Push To Talk keys are provided, and each key has a specific meaning. For example, Japanese Patent Application Laid-Open No. 2003-202890 discusses a technology in which a set of speech recognition grammar to be used is switched in accordance with the Push To Talk key that is manipulated. As a result, the user is able to select the set of speech recognition grammar to be used, as well as give information about the start of speech, by depressing the key.
As described above, according to the prior art (WO02/031643), the recognition vocabulary used in the speech recognition can be reduced. However, while the speech can be input into an input target within the displayed area on the GUI according to WO02/031643, speech input onto the input target which is not displayed, is not considered. For example, users who are used to operating the system may want to input items into the input target not being displayed. However, it is difficult to respond to such a demand.
As described above, Japanese Patent Application Laid Open No. 2003-202890 is prior art using a plurality of Push To Talk keys. However, these keys do not switch the speech recognition grammar in accordance with change of the display.